Vibe-Coding an Attention Firewall, w/ Steve Newman, creator of The Curve

Steve Newman discusses the AI tools and vibe-coding workflows he uses, including an attention firewall, reading app, coding-agent dashboard, automations, and universal logging. He also covers security tradeoffs, mobile workflows, anti-tokenmaxxing, and views on AI’s broader impact.

Vibe-Coding an Attention Firewall, w/ Steve Newman, creator of The Curve

Watch Episode Here


Listen to Episode Here


Show Notes

Steve Newman, creator of Writely and founder of the Golden Gate Institute for AI, shares the personal AI toolkit and vibe-coding practices that have reshaped how he works. He walks through bespoke tools including an attention firewall, a reading app for surfacing new ideas, a coding-agent dashboard, workflow automations, and a universal logging system for debugging with Claude. They also discuss information security, mobile and voice workflows, Steve’s “anti-tokenmaxxing” philosophy, and his views on AI takeoff, robotics, and climate change.

Google: Try Gemini’s Nano Banana image generation model in Google AI Studio or the Gemini app to create custom illustrated worksheets in seconds, and explore the app’s quizzes and guided learning features.

Sponsors:

AvePoint:

AvePoint is building the control layer for AI agents so you can securely govern, audit, and recover every action at scale. Design trusted agentic outcomes from day one at https://avpt.co/tcr

VCX:

VCX, by Fundrise, is the public ticker for private tech, giving everyday investors access to high-growth private companies in AI, space, defense tech, and more. Learn how to invest at https://getvcx.com

Claude:

Claude is the AI collaborator that understands your entire workflow, from drafting and research to coding and complex problem-solving. Start tackling bigger problems with Claude and unlock Claude Pro’s full capabilities at https://claude.ai/tcr

Tasklet:

Build your own Cognitive Revolution monitoring agent in one click.
Try it for free and use code COGREV for 50% off your first month at https://tasklet.ai

CHAPTERS:

(00:00) About the Episode

(03:25) Special Sponsor

(04:47) Building personal productivity tools (Part 1)

(14:23) Sponsors: AvePoint | VCX

(16:45) Building personal productivity tools (Part 2)

(17:32) Security tradeoffs and caution

(26:00) Touring the custom toolkit (Part 1)

(26:05) Sponsors: Claude | Tasklet

(29:56) Touring the custom toolkit (Part 2)

(38:01) Stack choices and dashboards

(45:12) Hooks, repos, and syncing

(58:08) Logging, agents, and tools

(01:11:18) Hard parts and iteration

(01:18:57) Mobile workflows and UIs

(01:26:19) AI-era engineering changes

(01:35:54) Software jobs outlook

(01:41:35) Thresholds, Mythos, and RSI

(01:57:07) AI and climate

(02:01:37) Golden Gate mission

(02:07:50) Episode Outro

(02:12:01) Outro

PRODUCED BY:

https://aipodcast.ing

SOCIAL LINKS:

Website: https://www.cognitiverevolution.ai

Twitter (Podcast): https://x.com/cogrev_podcast

Twitter (Nathan): https://x.com/labenz

LinkedIn: https://linkedin.com/in/nathanlabenz/

Youtube: https://youtube.com/@CognitiveRevolutionPodcast

Apple: https://podcasts.apple.com/de/podcast/the-cognitive-revolution-ai-builders-researchers-and/id1669813431

Spotify: https://open.spotify.com/show/6yHyok3M3BjqzR0VB5MSyk


Transcript

This transcript is automatically generated; we strive for accuracy, but errors in wording or speaker identification may occur. Please verify key details when needed.


Introduction

Hello, and welcome back to the Cognitive Revolution!

Today my guest is Steve Newman — the veteran software engineer who created Writely, the startup acquired by Google that ultimately became Google Docs, and who is now the founder of the Golden Gate Institute for AI, the nonprofit behind the Curve conference, and author of the Second Thoughts Substack, which is increasingly popular among the AI obsessive set for its grounded, well-balanced, anti-sensationalist, and at times openly-confused analysis.

We do eventually get Steve’s takes on some of the biggest open questions in AI - including how near or far we may be from a reclusive self-improvement driven intelligence explosion, how far behind digital AIs robotics will lag, and whether or not AI will have a major impact on climate change - but the main focus of today’s conversation is a show and tell of Steve’s personal AI toolkit and vibe-coding practices.

I wanted to have this conversation, because having spent the last few months building up my own Claude Code powered personal productivity stack and now autonomous assistant too, I feel that though I am getting outstanding value from what I’ve built, I still stand to learn and gain a lot from seeing how someone like Steve, who’s been programming professionally since 1985, is using the latest tools.  

As you’ll hear, it unfolded exactly as I’d hoped. 

We walked through the dozen or so bespoke applications that have fundamentally re-wired how Steve interacts with the digital world, and which for me produced a number of light bulb moments when I realized how much value I’ve still been leaving on the table.

These include, among others: 

  • an attention firewall that alerts him about urgent messages without requiring him to constantly check email and messaging apps;

  • a personal reading app that attempts to flag meaningful new ideas in the otherwise overwhelming number of newsletters he’s subscribed to;

  • a dashboard that allows him to see the status of his various coding agents at a glance;

  • a Chrome extension that automates common workflows;

  • and a universal logging solution that allows Claude to debug and fix the errors that inevitably pop up.

Along the way, he also describes his strategies for information security and integrity, how he uses mobile and voice, and – my favorite: his anti-tokenmaxxing philosophy, which he sums up as “the agent’s not important; I’m important!”

Because there is quite a bit of screen sharing, this episode is probably best consumed in video form on YouTube, but I think we do a good enough job narrating that it should also work well in audio form.  And in any case, even if you don’t listen at all, I encourage you to do what I did immediately after recording.  Copy the transcript, give it to your Claude Code or OpenClaw, and ask it to identify the ideas we discuss that would most meaningfully enhance your personal setup.  

I’ve already created a new UI to help me produce the podcast more efficiently, and I’m working on a number of hooks, and a personal chrome extension, and I’ll be very interested to hear what your coding agents create for you based on this conversation.

With that, I encourage you all to subscribe to Second Thoughts on Substack, and I hope you enjoy this behind the scenes look at what one extremely accomplished builder is building now, with Steve Newman of the Golden Gate Institute for AI.  


Main Episode

Nathan Labenz: Steve Newman, once the creator of what is now known as Google Docs, now author of the Second Thoughts Substack and founder of the Golden Gate Institute for AI, makers of the Curve Conference. Welcome to the Cognitive Revolution.

Steve Newman: Really excited to be here.

Nathan Labenz: Yeah, I'm looking forward to this. I think this is going to be a fun session. We're going to cover a lot of ground, and some of it's going to be a little bit of show and tell. One of the things I've been thinking a lot lately is people are so excited about going down the Claude code and AI agents and all these sort of various rabbit holes that a lot of people are probably coming up with very interesting ways of working and not sharing them as much as they probably could because it's just so much fun to do and it's so bespoke and it's like, at least for me, I find I've had on my to-do list, like, I should do an episode about my setup. And I kind of keep thinking, well, I wanna do this one more thing before I actually get it done. So finally cornered you and said, all right, I wanna see what's going on. And then there's plenty more stuff beyond that to talk about as well. Maybe for starters, tell me what you're building. You've been programming since, I was able to see on LinkedIn as far back as 1985 with multiple companies that you started and exited, including Rightly, which became Google Docs. What are you building today?

Steve Newman: So, and this is all on the side 'cause, you know, 'cause my day job is, you know, at Golden Gate Institute, and so, and some of what I'm doing, a little of what I'm doing relates to that, but mostly it's just kind of personal tools. So on the side, evenings and weekends, I've got something like 15 different projects going, mostly under the heading of personal productivity, kind of the theme has been, You know, like so many of us, I've just been drowning, you know, over the last couple of years. They're trying to keep up with everything that's going on, in the world in general, and in AI in particular. And I now know-- this is a statistic I wouldn't have known until I started building these tools. Actually, I don't have the hard number, but it's-- I get something like 50 Substack posts, other blog posts, newsletters, like kind of big information items in my inbox per day. plus everyone I follow on Twitter, plus a bunch of WhatsApp groups I'm in. And I was spending, I don't know how many hours per day just trying to read, just keep it, let alone synthesize that, let alone do anything else. And so the theme of everything, of most of what I've been building is manage that workload. And I realized, like the first thing I built, and this is something I'd been dreaming about and muttering about for a long time, was something to just... summarize, and not because a summary is as good as the original, but more to tell me what to read. Like, you know, I don't know how many posts I'm going to get over the next few days about Opus 47, and I don't need to read all of them. And so pretty much the first thing I built was just an RSS reader that takes all the Substacks and other newsletters and podcasts and a couple of other things. And pre-computes a summary for each one, actually kind of two levels of summary for each one. And so every morning I can glance through that, glance through the summaries and decide which of these am I actually going to bother reading? Is this a new take on 4/7 or is it basically the same as I've already read? It's especially helpful for a podcast where there can be an interesting topic and it may or may not have an angle on that topic that I haven't seen before. And the summary can be really helpful for that. And then the other theme, so just like sort of distilling the information flow is the first theme. The second one, which took me a while, like I sort of had a vague inkling, there was something I wanted and I couldn't quite figure out what it was, and it finally crystallized for me is giving myself focused time back. I probably get a few hundred, every day I get a few hundred emails, Slack messages, WhatsApp messages, whatever. Only a fraction of which really need my prompt attention, but a few of them do. And so I was in the habit of probably 30 times per day, every time my brain sort of came up from air from a task, I would check my e-mail, check my Slack chat. I had about five apps I would rotate through, which is lots of opportunities for me to get distracted by seeing something I actually didn't need to see for a few hours. And so a much bigger and more complicated project has been, and like it's really about five sub-projects, is something that pulls in all the e-mail Slack, Signal, WhatsApp, and so forth, feeds them through, and the pulling in is a big complicated mess with lots of integrations, including with some services that didn't really want to support that, like WhatsApp. Then there's like one line of code to hand each message to an LLM and say, Is this urgent or not? And I've accumulated about a one-page rubric. you know, gradually, exception by exception. And then the timely ones pop up on a second monitor that I purchased for this... I've gone like 40 years of engineering without a second monitor, and I finally bought one to have a rolling view of my calendar and this, like, list of urgent messages. And the idea is that those are the only things I need to look at other than whatever I want to be focusing on right now. So it's sort of an attention firewall.

Nathan Labenz: I like that phrase, and I love the idea that you had a very practical and experiential sense of what you were trying to accomplish. For me, it's less time in the chair, more time exercising, and more time outside. But also, I need to sort of square that with not losing track of what I'm doing as well, so I think that's really helpful to try to get concrete in envisioning, like, how is my life going to be different if this project is successful, lest we fall into the optimizing the agent setup for the agent setup's sake, which I think is obviously very alluring for-- Well.

Steve Newman: I've forgotten because this was weeks ago, or probably a couple of months ago, but there was a period where I was spending a lot of time on Yeah, like Claude code skills. And that, somewhat to my surprise, has settled down. And I'm sure it will unsettle again. But I'll also say, like, I was probably three quarters of the way building this attention firewall thing before I understood what I was building. You know, I was very much fumbling in the direction of something and, like, had to do a lot of iteration before I was able to crystallize it. And I think that's a lot of what we're all collectively-- we're all fumbling. There's no playbooks here, right? I think you said something about this a minute ago. We're in the middle of a Cambrian-- we're all doing our individual components of the Cambrian explosion and making things up as we go along and only understanding in hindsight or mid-sight what we're doing. I would say that won't settle down for a long time, except, of course, actually it will never settle down because by the time it would, there will have been the five new inputs and we'll be in the next round of chaos.

Nathan Labenz: One big question I have on both of those projects is how are you handling context? With the newsletters, there's sort of a question, I guess, of like any individual newsletter, you could say, Will this be of interest to me? Here's my interests, scoring on that basis. So how are you handling context? And there's obviously multiple dimensions or layers to that, but at least two that jump to mind are like, how do you make sure that the information is being filtered effectively against like what it is you care about, want to learn about, et cetera. And the other is sort of when you have so many things coming in and there's so much duplication, How are you managing to cross-reference these things against each other to try to tease out like what is genuinely new from each bit?

Steve Newman: Yeah, you know, it's interesting. I'm not. It's the dumbest possible thing. It literally this tool literally just takes the full text of each Substack post or podcast transcript, dumps it into an LLM and says, summarize this. And like, I've done a little bit of iteration on the. exact prompt, like, you know, what kind of information I want it to surface and what I don't, but it's completely static, no context at all. Last year, I'd, you know, I'd been sort of envisioning this a little bit, and I had was thinking in terms of, yeah, like I wanted to know everything I've already read so it can identify what's new and so forth, and I didn't bother with that in the first iteration, and I haven't been motivated to do anything about it. So it just gives me the summary Again, with a little bit of finesse in like-- I think the prompt says things like, surface any novel ideas. But that's a novel against the LLM's training date, what the LLM from first principles thinks is novel. And obviously, it would be better if it could contrast that with what it knows I've already read. And to my surprise, it just hasn't been-- like, I know what I've already read. I can skim a one paragraph summary in about 10 seconds, and that's efficient enough. You know, there's a whole other side of this where, you know, the road a lot of people are going down, at least seemingly, and I've not gone down at all, is actually like responding to emails or just kind of, you know, acting on the content of my, at least, digital life. And... I haven't gone down that road at all. I know a lot of people are. And partly because it just feels a little daunting, both in the complexity of the project and the security concerns it brings in and so forth. And I'm sort of conservative by nature. I don't like to use a tool unless I really know I can trust it and I understand what it's going to do. So that does feel like that's a pool I feel like it's going to be so worthwhile to jump into that I will eventually find myself forced to, but I haven't gone there yet.

Nathan Labenz: I'm not conservative by nature in general. I'd say I usually, um, my attitude on computer security has been historically borderline negligent over time. But this has changed my mindset. Like I definitely find myself being, I've always in the past been like, who really, do I really have anything that valuable? Or, you know, I'm not like a big target, who cares? But now it's like, I don't know, you know, as I give a, AI access to not just everything that I've ever written, but like everything anybody's ever sent me. You know, I feel like a certain kind of duty of care to guard their information, you know, that they trusted me with and never really thought that it was going to be going into some AI that hadn't even been contemplated at the time that it was sent in some, you know, I've had my same Gmail account for 20 years. So that definitely has caused me to slow down and take a more deliberate approach to try to figure out, like, under what circumstances do I give how much access and when do I want the thing to kind of draft something for me versus when do I think it might be more helpful for it to try to play the role of an assistant? And I'm definitely still feeling my way through a lot of that stuff as well. But it is striking that I'm like compelled to, I feel compelled to take my time when usually I would just sign up and let it rip on just about any other software experience in the past.

Steve Newman: Yeah, and I hear you. And it's a great point about, you know, your data is also other people's data. And it's, yeah, like, The trade-off between security and utility is really building, right? Like, you know, like it's getting really, you know, the, you know, open claw and every, or I don't even remember what we're supposed to call it now. And, you know, and I keep waiting for a shoe to drop there. I keep waiting for the stories of people really regretting their, their life choices around, you know, and like, There's been the one or two anecdotes that circulate, but hardly anything. And you have to think those are juicy stories. And if people were really getting burned by prompt injection or whatever, or just bots deleting production databases, deleting your e-mail history or whatever. Again, there's one or two stories, but I've only seen a couple. So it's hard to explain why things aren't, there have been more problems other than Maybe it's harder to exploit this stuff than you'd think. And even, you don't have to have malicious problems. You can also just have sort of overeager bot problems. And I think probably part of it is the model developers and the tool developers are working, they're adding classifiers and whatever. I haven't followed it closely, but it feels like every new model report card says we've reduced prompt injection susceptibility by another X percent or whatever. Somehow we're keeping ahead of the curve. And yet at the same time, everyone agrees that fundamentally, this whole system is totally insecure and broken if you trust it with anything. And so I don't understand how that tension is going to resolve. I think this is going to be very interesting to keep following. But meanwhile, I kind of feel like the guy in Raiders of the Lost Ark, Asps, very dangerous, you go first.

Nathan Labenz: Yeah. Yeah. I mean, even this is happening at like every level, right? I mean, the, the model level, obviously with Mythos, we see greater utility, greater security concerns. When you give access to tools, it's the same thing. Even like upgrading software has suddenly become this kind of weird Damned if you do, damned if you don't, because you're like, well, there's supply chain attacks that are starting to get scary. So I've seen people say, don't update anything until the package is seven days old. But then the flip side of that is if we're patching critical vulnerabilities that just got discovered, you want those patches fast. And so now do I have to keep track of all these dependencies? What a nightmare. So yeah, I don't know. It is weird. I think you put your finger on something there that is like, I very much associate this style of thinking with you of kind of coming at it's and the, you know, it's in the title of the, of the Substack second thoughts as well, coming at these core questions from both perspectives. And just a lot of times seemingly we end up kind of confused. Like there's not great answers. We could probably touch on a number of those things as we go, but Is there, I mean, are we just in a, are you personally just in a total state of confusion when it comes to like, give, I mean, I think the security vulnerabilities are pretty real and pretty obvious. And we've even talked about this a little bit offline in terms of like, why aren't we seeing more phishing scams? I feel like I have seen a little uptake or uptick recently in a couple sophisticated, seemingly scammy emails coming my way, but not nearly as much as one might have thought. And the same thing is true with like election you know, deepfake things like, you know, that didn't really happen. Do you have a story for any of that, or are you just kind of still confused about it?

Steve Newman: Mostly confused. You know, it's, you know, I think you could argue that there's just sort of a lot of precedent that sort of bad guys can be just as slow to innovate and adopt as anyone else. And, you know, like, it's easy to... You know, it's easy to point back-- like, you know, in the-- I think it was in the '80s, the-- do you remember the Tylenol scare? There was this incident, I think, in the early '80s where someone, if I'm remembering correctly, put cyanide in a couple of pill bottles or a small number of containers of cyanide on store shelves. I don't remember whether they were tampering with them in the-- like, walked into the store and tampered there or exactly how it happened, but a handful of people fell ill. I think there were a couple of fatalities. And to this day, that's why so many products you buy have the safety seal on them, a little plastic wrap or whatever. And anyone could have done that at any point in the last however many hundred years. It didn't happen until the '80s, and then it happened once. It could still happen. There are plenty of things you buy at a store and put in your mouth that don't have that safety seal, whether it's produce or whatever. You know, there's just, you know, in every walk of life, there's sort of so much low-hanging harmful fruit that I don't entirely understand why most of these things don't happen. I'm glad that they don't. And so, you know, one theory is that just, you know, whatever complex sociological factors are going on there continue to apply here. Now, that's a little hard to completely believe in because we also have a lot of sort of opposite case studies in cybersecurity, like if a server is vulnerable to a well-known attack, some script kiddie is going to get in there, or some bot is going to get in there. There are definitely systematic bad things that happen on the internet, and going back decades, I think the statistic was like, if you just took an unpatched installation of Microsoft Windows and connected it directly to the internet, it would be owned within five minutes or something, that goes way, way, way back. So I don't know how to reconcile those two patterns of the world. And if anyone can shed light on this, I think that like I think it's a very important question to ponder, but I don't actually have any insight into it.

Nathan Labenz: One kind of AI specific story that I find at least somewhat compelling is simply that if you're good enough at AI to scam people effectively with LLM generated phishing attacks, you could probably make honest money in, you know, similarly easy way because there obviously is like a ton of demand from legitimate businesses for people who can make it work reasonably well. So I find that at least somewhat persuasive for the moment. Um, that might be a lot of it does still kind of remain a mystery overall, I think.

Steve Newman: Yeah.

Nathan Labenz: How would you like to show us some of your stuff? I think, um, the extension or the kind of corollary of my intro is people should watch other people use computers more. I feel like in particular, folks who started programming in an era where there was like very classic editors and like lots of command line and cron job type of stuff. seem likely to me to have a kind of advantage or a little bit of a different paradigm that now suddenly becomes more relevant again as we're all like using command line tools. And most of us, myself included, have like very limited familiarity or attraction to that modality before. So I'd love to just peek over your shoulder for a minute, if you wouldn't mind, and learn a little bit about how you actually use AIs and I guess even more generally, like how you use the computer.

Steve Newman: Sure, yeah, let's go for it. So, okay, share screen. So I thought I'd start with showing off a few of the applications I was talking about before. And I've got them all lined up in tabs here. So this is that feed reader I was talking about. This is current live view. And there's very, fundamentally, there's very little to it. This is basically the only screen I use. It's the dumbest possible thing. It's just a list of posts. And it looks like we're getting demo disease because-- yeah, so the last few posts, I must have just broken something. And they should all get summaries within about a minute of coming in, but for the last hour, they haven't. But the older ones have them. So this is the summary I was talking about. And again, so kind of my workflow here is whenever I have a little idle time and I want to distract myself, I run through this and I see-- I might go through an order, or I might jump around, and I look at, Is this something I want to read? And I can either click on it, and that'll just open the original, or the main thing I'll do is I'll go over here and hit Archive. This is an example of iteration, something I thought I would want to do, and almost never do. So this is an icon that will... open a Claude session with that article in context, so I can ask questions about it. And I actually forgot this feature was there until just now, because I haven't been using it. But something I will do sometimes, so overview, this is also pre-computed. It's also just a simple LLM prompt, summarize this post. But it's a different prompt that generates a longer, about a one-page summary. And it specifically says, tell me the novel ideas here. Tell me the notable evidence. So I basically gave it prompts that correspond to these section titles. And mostly, I either look at the first summary, and I'm either going to read the article or I'm not. But if I'm either on the fence or I feel like it's not worth reading, but maybe it is worth getting a little-- like, this is my sort of-- 8020 alternative to reading the post. And so that's pretty much it. There's a bunch of other stuff in here, all of which is just like infrastructure to keep the tool working, almost none of which I would have bothered to implement if I had to do it instead of an AI. But so like it, you know, every day it dumps a backup of the whole thing into into D2, which is Cloudflare's version of S3. You can view the backup. I would have never in a million years bothered to-- and this is a pretty printed view of the information in the backup. I would have never in a million years bothered to implement a pretty printed backup viewer. But that was one sentence in one of my prompts. And this helps me be reassured that the backups are really working. And yeah, this was like a bunch of machinery for importing all of my Substack subscriptions, which involved vibe coding, bookmarklet to like rip apart my Substack subscriptions page HTML, because there wasn't another good way to get the list of, you know, my Substacks. You know, the just there's all these like the feature set is much longer than I would have bothered to implement myself, you know, which is one of the interesting things I found. But basically, you know, this is the whole tool. And then The one thing I'm gonna show you, screenshots, instead of the live thing, 'cause the live thing can be sensitive, is the...

Steve Newman: This is basically the attention firewall. I named it Radar for the old MASH character, Radar O'Reilly, who, you know, was just always there the moment you needed him with the information you needed before you needed it. So this is the three... This is what is on... Can you see my mouse? Yep. Yeah, so this is what's on my second monitor, so it's a three-hour view of my calendar. And another theme here is you can just customize everything. So this is not actually my entire calendar. I've given it rules about... There'll be things on my calendar that are just to block off time. It's not a thing I'm gonna do, it's a warning to others not to book that time or things like that. Or it's like my wife's calendar that will show up in my Google Calendar view. So this is the... idiosyncratically distilled, filtered version of my calendar with a bunch of shortcuts. Like if I wasn't missing this call to do the podcast, you know, I can go right here. And without even opening the calendar entry, that's the button to join the Google Meet. That's the button to open my private notes of what I would discuss in this meeting. And that's the button to open the shared document and the team we would all be in during that meeting. So these are all just little vibe-coded rules, like it knows if the calendar entry has this title, subject line, that's a recurring meeting we have internally, and this is the doc I always want to have in front of me when we're on that call. So just lots of little idiosyncratic things like that. And then the other half of it is the attention firewall. So these are the classifications of events. A lot of them-- like urgent and midday are zero because it's easy for me to stay on top of those now. And so for each one, it's kind of like the inbox view, but with a summary on each one. And if I click on one of them, I get this ugly little toolbar that's full of keyboard shortcuts. And again, idiosyncratic, like forward means forward to my wife. Because we're on a lot of shared, like, you know, Amazon and PayPal accounts and stuff, and I'll get, you know, notices that actually she would care about. So one button, like, no, you know, forward and archive. And one other theme here-- I'm pretty fast and loose in the way I develop this stuff. Like, everything-- I have no staging environment, everything-- like, whatever automated tests my exhortations have caused Claude to build, which I don't know. It says it has a lot of tests and it says it runs them, and I kind of believe it, but I just push to production all the time because I keep the stakes low on all this stuff. Like, all my messages really live-- they're where they've always been in Gmail and WhatsApp and Slack. This button-- like, if I want to reply-- I can do lightweight replies here, but if I want rich text formatted reply, or anything even slightly complicated, this means move it to my Gmail inbox. But it's always been in Gmail. This is just a label change in Gmail. So if this totally falls down, if Claude has the bright idea to delete my production database or whatever, Everything is still where it's always been. I just lose the nice interface to it. And by the way, that's never happened yet. In the couple of months I've been working this way, it feels much longer.

Nathan Labenz: Knock on wood.

Steve Newman: Yeah. And then you can see every one of these tabs is like some other part of the toolkit. I don't know how much of this we want to go through, but it was just like, it's so easy to build tools. And so I just keep adding to the pile.

Nathan Labenz: I'm interested to go through some more at least, and 'cause I do think people, again, just benefit from seeing what other people do and get inspiration from it. Maybe a couple questions to kind of prompt you as you go a little deeper. When you're coding, is it all clawed? Is everything clawed? Is there a place for codecs in your workflow? Are you keeping this, when you talk about like all the originals are still in their place, But is there a sort of shadow database that you're pulling them into and then there's also stored there? Or are you just kind of doing a runtime call to get the most recent stuff? And, oh, I had one other one. Oh, and why CloudFlare? What is it that you like about CloudFlare specifically?

Steve Newman: Yeah, so I'll answer the last one briefly, because otherwise he'll forget. I think basically I had some long conversation with... This was a key decision at the beginning, where to host. I'm pretty sure I hit Gemini, ChatGPT, and Claude. I want to build a set of web apps. I'm an experienced developer, but I don't want to get my hands dirty and I'm kind of rusty. This is the kind of stuff I want to build. I don't care too much about cost because it's only one user. I gave it a whole bunch of context. And what stack should I use? Hosting provider, programming language, front end, back end, CSS library, whatever. And I let all three of them spew, and then I pasted each output into the others and had them critique a level of effort that I don't normally bother with. And by the way, I've more recently built a skill to automate that, which I had forgotten I'd built, and I need to use that more. We have such an embarrassment of riches of both Tools other people have built and tools we've built for ourselves now. But anyways, I did all of that, and basically that rose to the top. Also, like I had kind of the idea in my head that anything Claude Flair does, they probably do pretty well. I wouldn't defend that, but that's just sort of my spidey sense from things I've read on Hacker News over the years. So, just kind of, yeah. And I've been happy with it. Like it's, you know, sort of just, there's enough of a toolkit there that it has. Process hosting, it has cron jobs, it has queues, whatever. It has just enough of a toolkit that has databases to do everything I want, but so much less complexity than something like AWS, and mostly pretty cheap for small-scale usage. And then, yeah, so why don't I just kind of speed run through the suite here, and then talk about the development process. So this is a tool, very specific, I pointed at a Hacker News comment thread. And it has this-- some whole workflow that it reads the linked article, reads all the comments, identifies themes-- so this is-- I pointed at-- I haven't read this, what we're looking at yet, but this morning, I pointed at the 4/7 announcement discussion on Hacker News. These are themes that emerged in the Hacker News comments. People complaining, as people always do, about an existing model getting dumber, you know, Should you-- I mean, you can see these here, I haven't read them, but so it identified themes. gives a summary of the article, a summary of the overall comment thread, and then I can click on one of these themes and see all of the comments that it fell under that theme. And a comment can be tagged under multiple themes. So I don't use this all the time, but sometimes it's handy when there's-- 'cause, you know, those threads can be very heterogeneous. There'll be a section that's really interesting and another whole section that's about something I don't care about, and it's hard to find the-- the part, and it's all entangled together. And by the way, again, you know, idiosyncratic, you know, like there's all kinds of integrations that I've built because they're so easy to do. So here, I'll open up Hacker News, I'll click on something, and I don't know whether you can-- it looks like you can see this. So this is a Chrome extension where I can-- it says saved in Notion, but the word Notion there is kind of out of date. This takes the current page and I can add it to my To Do app that I'm going to show you in a moment.

Steve Newman: I can add it to the Reader app that I just showed you, or I have like certain sections of my Notion tree are whitelisted into here. Mostly, I have a page full of subpages of notes on blog topics I might write about someday. And so there's a whole little like set of heuristics for like kind of what gets shown here and I can type and filter and whatever. And if I throw this into Reader, Then there's a special rule that says if I put a Hacker News discussion thread into the reader queue, it will feed it into this tool and, you know, this takes a few minutes. So in the background now, it's building this summary. Okay, speedrun. So this is a view of my Gmail spam folder, sorted by -- actually, I don't remember the sort order, but it's not by date, it's by these columns in some order. which I find is super handy for skimming through it. Like, for whatever reason, I get a lot of emails that are addressed to me@aol.com, which, spoiler alert, is not my actual address. So when I'm scrolling through this, it's very easy to, you know, jump past that entire... This is just a little idea I had one day. It'd be easier to plow through my spam if it were sorted. So, implemented that. This is the other thing that goes on the second monitor. It's the status of each of my coding agents. So blue means it finished a task. And this is based on, there's like eight different pieces to this held together with, you know, bailing wire and whatever. So there's a little web app that's running this web page. There are Claude code hooks that are reporting to the web app when status changes. If I click on one of these buttons, it will open that terminal tab, which involves AppleScript and something called Hammerspoon, which is a macOS utility that I don't even know what it is, but Claude told me to install it and I believed it, that somehow glues these things together. And there's a Chrome extension in here somewhere, oh, so that I can do screen shortcuts. I hit Control-J. Oh, you can't see it here. This only works in Safari, which is not the window you're looking at. You can't see it, but there's a Command key I can hit, and then these light up with numbers 1, 2, 3, 4. And if I type that number, it'll open that terminal tab. The red ones are there's no active agent, but in my to-do list app that I'm going to show you in a moment, I have to-do entries for that app. So it's also integrating with the database from that other app. And there's other statuses for the agent is busy working or it needs to ask me a question. So that's that one. And there's not much to this. That's my entire agent management toolkit. It's just this one little thing with the color-coded buttons. But again, it's sort of the attention firewall thing. I don't need to look in on my terminals all the time. I can glance over at this on my second monitor, and my brain has already internalized the color-coding, and I know whether anything needs my attention or not.

Nathan Labenz: And just to make sure I understand the structure of that, because that's maybe something I want to do. I don't really use hooks much at all. I think hooks and... cron jobs and these various things are like, they're not instinctive for me, but, so the, tell me a little bit more about the hook. Like it's sort of when Claude finishes, it reports to this app in the cloud what its status is.

Steve Newman: Exactly. That's it. And this is the only use of hooks I'm making that I can remember. And I don't understand much about it. You know, I, I basically. I had a little conversation with Claude at some point. I want a view of which agents are busy and blocked. Let's brainstorm ideas. I don't remember whether it suggested hooks or I did. And it said, yeah, I can do this with hooks. And it's not perfect. Hooks don't quite get enough information to do a perfect job of this, but it works well enough. So the short version is it's hitting-- I've got this cloud app, it exposes an API. So the hook just runs curl or something to hit the API. And then there's a server push notification from that web app down to this web page. And so I get real-time little color updates as the agent starts and stops.

Nathan Labenz: Is this like a bunch of different repository? I mean, I assume this code lives on GitHub as well. How do you organize it into--?

Steve Newman: Repositories or a mono repo? Yeah, so there's about 15 projects. You know, each app I've been showing you is its own project. And by the way, I deliberately broke it down into projects, A, to keep the context manageable for the coding agent. And it's not like I tried different approaches and found that this was just my gut. So my gut was like microservices, basically, like keep each project as small as possible so the context is manageable. which was probably more important in the ancient days of January when I started this than it is today. Each has its own GitHub repo. Yeah, they're on GitHub. Each has its own database backend and its own little project or whatever in CloudFlare. So they're somewhat isolated to one another, but they invoke each other's APIs a lot. And I hadn't really planned this, but they're all on my hard disk-- in fact, on my Mac, all the Claude coding is in a Docker container. So I run Claude and dangerously skip permissions mode, but it's in a Docker container. All of these projects have their repo directories next to one another under one home directory in the Docker container, and so they can see each other's code. And sometime, at first, I would like... If there needed to be an API update, I would like ask the one agent to like tell me what I should tell the other agent and I would like manually copy it. But then I realized I can just like, okay, reach over into that other app, give it the API and use the API or whatever.

Nathan Labenz: Yeah, that's interesting. I think I'm following a fairly similar pattern, although probably in keeping with my generally less structured personality, I kind of start with everything in one repo, but then over time, especially if I want to share something, then I'll split it out into a separate repo. So it's kind of, anytime I want to just try something, it kind of goes into the sort of personal, private mono repo, but those things are at some point like how big is that promoted?

Steve Newman: How big is that mono repo?

Nathan Labenz: I mean, the big thing is for me, context exports, because I have gone basically five years back into all these different channels. And that has added up to roughly a gigabyte with all...

Steve Newman: Oh, so you have a lot of content in that repo?

Nathan Labenz: Yeah, and I'm not sure if that's quite the right way to do it either, but I did want to have some backup and I was like, where, you know, where should I make this? It could be, could be Google Drive or I don't know, I just decided to go with actually putting it into GitHub. It's like a, you know, large file LFS or whatever that's called. Aside from that, it's not that big. You know, I would have to run a script to know how many lines of code or how many tokens or whatever it is, but it's manageable besides the, yeah, besides what I call the deep context database.

Steve Newman: Yeah, yeah. So yeah, the repos that I've My GitHub repos only have code. All the data is in CloudFlare databases. People talk a lot about the agents really understand how to look at file systems. And that does sound like a good approach, but I just haven't tried it.

Nathan Labenz: I actually have it. It's a SQLite database that just runs locally. So it is probably similar in terms of like, you know, it's a query action for the agent. And another thing that I do that's kind of similar is I have sort of the top level Claude file point to other Claude files so that if it doesn't need to kind of know what's going on in another project, it can still go see that. I'm trying to mostly make that kind of one directional because some of the projects that I do split out that I want to collaborate on, like when Salient One is tools for the production of the podcast. That started off intermingled with my personal e-mail and history and communications and all that stuff. And I was, okay, well, I want to share this with a couple people so we can work together on it. Obviously, I don't want to share my entire e-mail history, not because I don't trust anyone, but just because that's what the people who've sent me those emails I think would want me to do. And so splitting them off, then I'm like, okay, how do I want Claude to... I don't think this is like, robust security, to be clear. But if I start Claude in the main one, it has pointers to the other one. But if I start it in the other one, it doesn't have obvious pointers up. It could, you know, look around and get outside of its immediate view, but it's at least not like instructed to do that. And then also, if somebody else clones that separate repro, then they're not going to have the main one anyway. So, you know, on their computer, it just wouldn't have that kind of access. But okay, cool.

Steve Newman: Helpful. And then, yeah, and so then you asked, like, am I mirroring data, right? So yeah, so I built one of the projects is called Mirror, by the way, only just now realizing that everything I've been talking to, you know, half the listeners will only be listening and not seeing. So I'll describe a little more. So this is just listing all of the different data things I've integrated with to pull data out of. So it's pulling my Google contents, Google Calendar, sorry, Contacts, Google Calendar, Gmail, WhatsApp, Slack. Twitter, Google Docs, Signal, and SMS messages off of my phone. And so this is, again, the kind of thing I would have never built myself. There's this whole status dashboard of how many records have been imported and how's that going and whatever. I don't even look at this. I only used this briefly when I was first getting it working. So the idea is, now I'm building up this rich database of all that content that I can use for context. I haven't actually done much with it other than driving the real-time inbox view. But, you know, and there's a whole, you know, web UI for searching and, you know, viewing backups and all this toolkit, which again, I used briefly while I was troubleshooting to begin with. Then it's done.

Nathan Labenz: There's like- Is that real time view, by the way? Is it polling? I mean, because I've done some of these integrations and some of them have been quite painful. I use Beeper Desktop to try to like aggregate a half dozen or so of them. That's also kind of painful. I feel like Beeper Desktop is, Good idea crashes a lot for me. And I haven't gone as far as getting pings in. So if I want to take inspiration from this and I'm like, oh, how do I get to a real time view? What I currently have is a batch process that I run a couple of times a day to kind of update my database. What have you found in terms of what actually works well for getting closer to real time, if not fully to real time?

Steve Newman: So this has been far and away the hardest part of everything I'm presenting, and a lot more grief, like, far and away the biggest time sync, and there's eight different solutions, and, you know, just each one was different. It is generally pretty real-time, and I pushed to get that, 'cause I wanted to have this attention firewall inbox view. So, like, Gmail, like, there are great APIs, they're a pain to use, but that's Claude's pain, not mine, and so, yeah, there's some kind of real-time sync And I think the calendar is also-- Google provides a sync API for the calendar. I don't even remember about contacts, but that doesn't need to be real-time. WhatsApp, there are a bunch-- as it sounds like you've found, there are lots of bad solutions for WhatsApp, and it's hard to find a good one. But it turns out-- and I hope that no one from Meta is listening to this-- but what I'm about to describe is not a secret, or I would have never found it out. If you install WhatsApp Desktop on your Mac, and I imagine on Windows, it is obviously syncing down all of your messages. Turns out it stores them in an SQLite database, which is not encrypted, so you can read it. So you're just... I'm just piggybacking off of WhatsApp's own real-time sync. There are a lot of WhatsApp integration solutions out there that... actively talk to some internal WhatsApp API, and there's a lot of stories about people getting their accounts banned, and other stories of people saying, I don't know what you're talking about, it's fine. I don't know exactly how dangerous it is to use those kinds of hacks, but I didn't want any dangerous. But this is, you know, there's no way for WhatsApp to know that you're looking at its read-only access to its database file. So I have just a cron job that runs probably once a minute, I don't remember. And this is on my Mac. So again, you know, there's pieces of this everywhere. There's inside the Docker container, there's outside the Docker container, there's up in the cloud, there's on my phone. So this one is running on my Mac outside the container looking at the WhatsApp SQLite database, Slack. is another API integration. Twitter is the worst, hardest one. I found some very shaky fly by night -- fly by night's an exaggeration, but some very kind of shady borderline looking service that will let you query Twitter And I don't know how they do it and I don't want to know. It's not like reading my feed, which would be nice. I have to manually poll every individual who I follow, but that's only about 80. And so this is the slowest one. It probably rotates through them about once an hour. It would start to cost noticeable money in API fees if I were polling more often than that. And this was like glitchy to get working. But they're all working now, like, you know, it's not like they break every day. Google Docs, again, is an API. Signal, I don't remember. And SMS is an Android app that's watching notifications on my phone. Signal might be similar.

Nathan Labenz: The new Twitter API, I think, is going to be probably a big hit for them, as much as it is a quite painful one to try to make work. I had a version of Twitter that was-- basically using a headless browser with my login and it would sort of try to use the cookie that I had most recently logged in with for as long as possible until it expired. And then you'd have to kind of reauth and whatever. And that worked like okay. It wasn't terrible. Certainly wasn't super reliable. But now I'm like, you know, I might just sign up for the paid API. And even though it's going to, you know, cost me a few cents to do a few things, it's probably.

Steve Newman: I didn't, I didn't realize they had an official API that would work for this. That didn't come up in my research.

Nathan Labenz: It's just been, I think, maybe the last two or three weeks, and I'm not even sure if it's like fully GA yet. But the big difference between their previous version and now is there's not a big fixed cost to enter. It is, it's kind of priced to, I think it's honestly pretty, it's probably pretty effectively priced for them where it's like, it's not super cheap. such that you're going to want to go really mine data. Or if you do, you're going to have a pretty good reason for doing it. But it's also not so expensive that if you want to get your own feed a couple of times a day, that you'll be afraid to do that. So I think they've landed at a pretty good spot that will allow people to have access without them having to fear that you're going to run off with the entire fire hose or.

Steve Newman: What have you. Yeah, that makes sense. Yeah, so if I were doing this again, I would use that and I may switch to it, like the next time this one breaks. And then just kind of run through to-do list, like there's stuff in here, repeating event, like repeating reminders and all kinds of things that are like idiosyncratic to me, but at the end of the day, it's not that complicated. This is just like a little, I talked about wanting to I want to put myself in a position where I can be kind of cavalier in the development and just move fast and break things, knowing that there's nothing too valuable to be broken. The Todo app would be the one where I'd be most sad if I lost the database. This generates a backup every five minutes if there's been a change. Unlike all the other backups, which are just dumps into D2 file storage, this is live-synced to GitHub. I wouldn't even need to do a restore. open GitHub.com, navigate down and see my to-do list, a static version of it, if the to-do list app ever glitches. This is the payload for the Twitter integration. So it's just a feed reader with just like a bunch of little details, like deduping, retweets, and other things, like just to make it, and like, it auto-expands, like instead of only showing the first 140 or 280 characters, it like auto-expands everything, so just like little fit and finish things that I prefer relative to the default behavior of the Twitter app. There's some infrastructure under this. And one of the things, by the way, I looked at this morning, it was broken, and I just told Claude to fix it. Let's see if it worked. Yes, it did. All of these apps, part of how I support, like, development velocity, is all of these apps feed into a logging service, and I found it annoying enough to configure with commercial services that I just built my own, had it build its own dumb little logging service. So this is just an SQLite database hosted on Cloudflare. All of the back ends log there. All of the browser front ends, the JavaScript code logs there. All of the Android apps log that, like, everything logs there. And there are, like, massive exhortations and code review rules about this in my Claude.md. That will shelt log errors, that will shelt log, you know, every time you modify the database that, you know. And so then virtually 100% of the time, remarkably close to 100% of the time, if there's something wrong, hey, this event's on my calendar, but it didn't show up in my calendar mirror view, or just whatever thing. How come those summaries, like none of the new blog posts that have come since noon have a summary? I can just tell it, debug this. And it has the data to figure out what went wrong. I use, there's a very popular Claude plugin called Superpowers by Jesse Vincent. One of the skills in there is, I'm pretty sure this is his from there, is called Systematic Debugging. So I just tell Claude, you know, slash systematic debugging, one sentence description. And from my ClaudeMD, it knows it has logs. It's been very loudly instructed to look at the logs, not to guess, but, you know, look for evidence. And it works so well. Yeah, if I had like one concrete, Here's how to agentic coding good piece of advice for people, especially... you know, sort of personal projects as opposed to in the context of, you know, a professional software development operation where you have production practices and so forth. But if you want to go to the trouble of doing anything more than just typing prompts into Claude, like if you're going to do one sort of infrastructure thing to make it work better, it's have everything log, including the front end, like all of the moving parts should generate logs in one place. It's annoying. to have to build that place yourself. And there's probably a better answer, an off-the-shelf answer, but I'm just not sure what it is. And then vigorously remind Claude that it doesn't have to make guesses about what's going on. It can go look.

Nathan Labenz: Yeah, cool. I mean, there's a couple-- there's quite a few, but the amount of front end.

Steve Newman: That you've made-- I think there's more-- this is a summary of all the-- Cronjog gives me a daily summary of everything in the logs, but those are some of the high points.

Nathan Labenz: So how many-- when you're actually sitting down to code, how many agents are you running in parallel? I'm getting the sense that it is all clawed.

Steve Newman: It's all clawed. It's kind of like-- in some ways, it's a very simple vanilla setup. I'm using the built-in macOS terminal app. I basically have one window with one tab per project that I'm-- you know, I open the tabs when I need them and close them if I haven't been working on that project for a while. And then if there's anything going on in that project, either the agent is working, the agent is blocked on me, or I have an entry in the to-do list for that project, then it has a bubble here. And so when I'm coding or when I'm clotting, which is not all that often these days, like it was more... Went through a period of about a month or two where it was pretty vigorous, like maybe 15, 20 hours a week mixed in with other things and mostly pushed to the weekends. Nowadays, it's maybe more like half an hour a day. But when I'm doing it, I'll be bouncing anywhere from zero to five parallel agents. I'll just kind of look at my to-do list and like, oh, like I've got active to-dos for like, I made a note about something I was annoyed in in this app, and I've had a longstanding idea for something I wanted to add to that app, and I just noticed a bug in that one. I'll tell this one, systematic debugging, fix this bug, and I'll tell this one, here's a small feature. I have another skill that I don't even remember whether I wrote it or it's part of superpowers, but it's just, here's my description of what I want, figure out how to do it, deploy it, commit it, push it to GitHub, end-to-end. Don't ask me any questions unless you really need to. I have a different skill for you should probably like have a conversation with me about the best way to go about this. Yes, I can get up to four or five of those running. One kind of process change I went through halfway through this journey is we all everyone talks about you got to keep your agents fed, like, you know, token maxing, like, you know, if the agent's waiting for you, you're wasting time. And like, I kind of went down that rabbit hole for a while and it's very stressful. And then And then I realized, wait a minute, the agent's not important. I'm important. And that kind of wasn't true at first, because when I first started Claude coding, I had trouble using more than one agent at a time. I was initially only doing one project, and I didn't want all the complication of work trees and three different things going on in the same code base and whatever. So there really was a sense that I had a long list of things I wanted it to do, And I was mostly sitting around waiting for Claude. When I was engaged, when my brain was in coding mode, I was mostly waiting for Claude, and so it was a real shame to let Claude be idle. Once I worked my way around to breaking things into multiple projects, I don't spend much time waiting for Claude anymore. Either because I have another project I can tell it to work, I can start preparing its prompt for, or I've, like, gotten good enough at, okay, while it's working, I'm gonna go, you know, read my RSS queue or whatever. So now I think in terms of I'm optimizing my time-nots, you know, I'll give Claude its next prompt when I'm good and ready, you know, when it's not an interruption to my mental workflow. And having this little status bar and having the, you know, the view of my inbox of, you know, whether there's anything urgent I need to look at, and, you know, having all those tools, makes it easier for me to kind of put my attention to where I want it to be, whether it's on give Claude its next, figure out Claude's next prompt or something else.

Nathan Labenz: Do you mind sharing your token budget these days or token spend?

Steve Newman: Yeah, so I don't know, which means it's small enough that I don't have to know. So I'm on the $200 Claude plan. I recently signed up for the $200 ChatGPT plan, mostly because I wanted ChatGPT Pro, not because I always feel like I tried Codex once like a month ago. I gave it one prompt. I don't remember what it was, one coding prompt. It behaved abysmally and I said, nah, that is not at all a fair test. Like, you know, update, you know, take that as a base update of 0.002. Like there's no update there, but I just cloud was working well enough that I wasn't motivated to push harder. But I'm really loving ChatGPT Pro. And I'll abuse it for the-- I signed up for it for, I forget, some meaty research question. But I'll abuse it for like, I'm driving up the San Francisco peninsula to meet a friend. My friend's taking public transit over from Berkeley. Where should we meet for lunch? Do a whole ChatGPT Pro's worth of investigation into that. And it's really handy, actually. But in terms of coding spend, It's all fitting within that. And I keep getting... I have Claude set to, like, purchase credits in $30 increments. And I've been getting the, You just purchased another $30 more often recently. Like, maybe I'm getting them a... It's up to two or three times a week, which is enough that I should look at it. I think that's mostly... generating all these summaries and stuff. And I'm really profligate, like I'm using Opus to generate these dumb little summaries and whatever, because like, why not? It's not that much money. And but I do have the like, if I hit my pro, you know, subscription limit, then like flip to burning API tokens. So maybe that's some of it, but I don't think so. The long answer, like long story short, I think all of my coding fits in the $200 plan.

Nathan Labenz: Yeah, gotcha. Cool. Any other tools you would shout out? You mentioned the superpowers, which I've heard of, but not used. So that goes on my to-do list coming out of this conversation.

Steve Newman: Yeah, so yeah, so superpowers is a very nicely, very elegant little package of Claude's skills. It might be ported for Codex also, I'm not sure. That's the only thing I've really installed, like set of skills I've installed, and I can't, yeah, nothing else comes to mind. Yeah, like, there's all these weird little utilities that Claude has sort of grabbed on my behalf, like that hammer spoon thing, which is, again, you know, some deep, you know, like, deep, like, you know, Hacker News geek labor of love, I think, having glanced at it for three minutes and not really knowing what it is, but it's one of these, like, labor of love, incredibly arcane toolkits that has just, like, 9,000 integrations through AppleScript or something with, like, anything you can automate using a Mac OS API and, you know, Claude installed this entire giant thing just so I could click a button on the Juggler app, and it will open a terminal tab. Hmm. But so like-- I was wondering how you did it. I'm not sure what they are. But in terms of things I interact with directly, it's really bare bones. It's the built-in OS terminal app, and it's Claude code, and the superpowers. And I feel... silly about that. Like we live, there's such a wealth of tools out there, but I keep not encountering reasons to try anything more.

Nathan Labenz: What I do these days often is when I see something that people are excited about or, you know, whatever, that's the viral thing of the moment, I will ask clawed to dig into it and see what about it or what ideas in it might be useful to us. So rather than a direct install, it's kind of like a scouted out action first. And then rarely is it like, okay, actually, yes, let's pull that in. And more often it's like, oh yeah, there's a couple ideas here that were good and we can kind of apply them to our own tower of jello in our own way and probably get 90% of the benefit of the core ideas. And I feel better about that also from a security standpoint, which is so out of character for me to even be thinking about that at every turn. But I do feel like running it through that filter gives me confidence that I'm not doing something totally crazy.

Steve Newman: Yeah. Oh, that's a great point about security. And yeah, that seems like a very good approach and I have a whole document, like a list of links piling up of things like that to look at. And I never get around to looking at it.

Nathan Labenz: What's still hard? I mean, you said kind of some of those gnarly integrations of things that don't want to be integrated with. And that's like hard on the level of like your borderline hacking software that doesn't want to be... WhatsApp sounds like it's kind of open to being read that way, but it's not like... It sounds like that's not even documented, right? So that's like definitely.

Steve Newman: Yeah, definitely, definitely, Claude was doing a lot of spelunking in the database schema to try to, you know, like, what's a reply versus an original message? And like, how do you go from like a user ID to a username? And like, yeah, the integrations were far and away the worst part. And I have to think, you know, like, of course, You know, there are all these solutions for this, but, you know, people are, you know, first-party providers are building MCPs and other kind, you know, and other APIs, and there's all these, you know, there's Tasklet and all these other services out there that are, and, you know, Claude Cowork, and, you know, all these other, everybody's building the integrations, and I have to think that's where a lot of value is going to be. But if you're, yeah, if you're just coding your own solutions, that's been really the hardest part. for me as someone with a lot of software engineering background, but rusty and no like, you know, kind of specific experience with the details of all, you know, everything that's going into these apps. You know, there's a lot I did to set up. Flare and hosting and like connecting, you know, understanding how to connect an Android app to a, you know, through a Docker container into a web app and everything, which wasn't hard because I know how all that stuff works. I don't know how it would have been otherwise. And then, you know, I think, but the main thing that's hard is not getting complacent. There's always the next level of productivity And it's so hard to unlearn the habit of the world is the way it is, and your tools are the way they are. And every now and then, maybe there's a new release, or you might look into a new-- maybe I'm going to switch from Google Docs to Notion or whatever. But mostly, your tool set is just static, and just unlearning the idea that you have to adapt your workflow to the tools rather than the other way around. And then figuring out, and then figuring out what to do with it. Like I, another thing I've always assumed I would do and I haven't done is something that works, works more with the detail of, so like, you know, I've got this sort of unified inbox, but, you know, so, you know, one e-mail I get every day is the San Francisco Chronicle daily newsletter. And, It's this long scrolling thing full of news items and ads, and I don't want to see the ads, and I don't want to see the food updates, and I don't want to see the sports articles unless they're about the Warriors. But I do want to see this and this. And I'd always assumed I would write some filter to show me just those parts, and multiply that times 50 other examples of e-mail that I get regularly. And I haven't gotten around to doing it. Maybe that's a good choice that I'm subconsciously deciding that it would be more trouble than it was worth. I think probably not. I need to motivate to... I don't want to have to build 50 features, so I have to come up with some conceptual framework for how I can get to the point where I just sort of say one or two sentences about each one to Claude, and Claude can figure out what to do with it in a way that I will trust that I'm still seeing the information I need to see, and I just haven't motivated to take a step back, and I probably need to dive in and fumble around and do it wrong the first time, and then eventually I'll settle in on a good way of doing that. So it's that push to actually take advantage of all the new opportunities and do the exploration. Another habit I've been having, my whole software engineering career, I always worked very hard to think things through, understand the problem upfront, kind of measure twice, cut once. Really understand the problem, make sure you've pulled out all the details of the use case, go and talk to the person who wrote the spec, find out what they forgot to put in, think through the four different ways you could structure the code. Do all that work up front, so then when you have to do all the long tedious work of writing and testing and debugging the code, that you kind of get it right-ish the first try. I think that's totally wrong now. Like the just dive in, do it wrong, throw it away, redo it is so much the better approach now. It's not at all my nature. And so I've been having like relearning that also is what's hard for me. Do you have thoughts on when to revert?

Nathan Labenz: This has changed for me and it's probably gonna change again. We just got 4/7 in the last few hours. So recognizing it's a moving target, I don't know. Six months plus ago, I would have told people that it's often easier to get the thing to work once in or in one shot than to have it fail and then figure out how to fix it. So I used to advise if it's not working and you're kind of stuck, if you're like looping at all, revert back to the last known good state, try that prompt again, maybe say, Here's kind of a bit of what went wrong last time, and you'll probably have a better chance of getting it to work that way versus trying to get out of that stuck state. These days, I don't feel like that's as big of a problem. I haven't found myself doing that recently. Do you have like rules of thumb or best practices for when you would press on versus fall back?

Steve Newman: It's interesting, I almost never find myself falling back, which is kind of shocking, and I don't particularly understand it. You know, I think some of that is, you know, good prompting. And like, so the superpowers package, there's-- I forget the names of the individual skills, but like one of the main themes of that is like there's a skill in there that basically gets Claude to do a good job of thinking things through and run the big decisions past the user before it moves forward and so forth. So some combination of the models are getting really good. And I didn't even start Vibe coding until 4.5, Opus 4.5, and then pretty soon it was 4.6. So I've been working with very recent good models. That's my experience. So between recent models that superpowers plug-in, and I do, I never look at the code. I don't need, like, all this code is TypeScript, I don't know TypeScript. I literally never look at the code. But I do think about the high-level decisions that Claude is making. And so somehow between just the models being good, the prompt, and me helping it avoid a few false paths, I almost never end up just giving up and reverting. I definitely think-- it has happened. I can't remember specific examples. It is a thing that the model can just wind up thrash down a bad path and then reverting is a good idea. But yeah, it just doesn't happen very often. Also, these days I'm mostly making incremental changes. There was a lot of big push a month or so, two ago when I was building all those integrations, importing from WhatsApp and whatever, and laying out. These days, mostly I'm just add a new feature, add a new feature. It's not making architectural changes, and it just doesn't go that wrong.

Nathan Labenz: Maybe the last practical question, and then we'll zoom out and try to take stock of what all this means. Any voice or mobile strategies? For me, again, I want to get out of my desk. I want to be on my feet instead of my ****. How do I do that? It's still a work in progress for me, for sure. Any tips in that direction?

Steve Newman: I see people talking about the remote control or whatever, and Claude, and all these terminal emulators and their phone and stuff. Those articles go in the pile of really good ideas for improving your vibe coding skills that I pile up and never look at. And then I sort of realized, kind of like what I was saying is I don't want to optimize Claude's time, I want to optimize mine. When I'm out for a walk, and I do that too, I get out for walks a lot, both physical and mental health, I think it's really good. And to sort of give yourself brain space for deeper thought, I'm a big fan of that idea. But I don't go out for a walk so I can get five more prompts in. I go out for a walk so I can be in a different headspace. And so what I've settled into that I really like is what I will do that's sort of work-ish while I'm out walking is I'll have some project I'm working on. Maybe it's the next blog post I want to write or next piece of analysis I want to do. But maybe it's the next app I want to build in Claude. I'll just like, it'll be percolating in my head while I'm walking, and then I'll pull my phone out, and then I'll just dictate a brain dump of ideas about whatever it is. So if it's an app, you know, okay, so I want, you know, I'll just ramble out feature lists and design decisions and design questions and, you know, whatever, just brain dump of high level thought about it. And then this is a simple trick I read somebody talking about that's worked really well for me. So you just take that brain dump and you paste it into whatever LLM and you say, organize this. And you pretty much just literally say, here's my brain dump, turn this into a cloud code prompt. And I usually then won't even bother to read what came out. It's usually good enough that I would rather wait to note it, like, Hey, you did this wrong. Like, it's more efficient for me to let it go through all the work, and then notice, like, No, you misunderstood my brain dump. I wanted this to work this other way. Rather than rereading its three-page cleanup of my brain dump. Yeah, so that's what I do, remote. And what I said, like, this feels ridiculous, but the actual tool chain there is I open the Gmail app on my phone, I click Compose, I type my own e-mail address, I click in the mail body and I click the dictation button on the built-in keyboard. And I am sure this is not the best way to do it, but it's always there. It always works. I don't have to worry whether the recording got saved, you know, so that's my process.

Nathan Labenz: Cool. So one thing I'm going to do after this is take the transcript and run it through a planning session and say, you know, go figure out all the good ideas here that can apply to our setup and I definitely think there are going to be several, at least. I mean, getting some hooks going. I'm struck by the fact that I have built almost no custom UI for myself at all. A lot of skills, but basically nothing that sort of presents things to me. And I'm realizing now, like, that's a gap and a half, you know, even just in terms of like producing the podcast, I've started to do more. This is one of the classic paradoxes I find of AI where I'm like, I wanted to be more efficient. What I ended up was doing more in maybe the same or maybe even a little more time in some cases. Because one thing I'm doing now that I do get some positive comments on, but mostly nobody cares, is making a custom song for every episode. And there's a time factor there where I got to listen to the song to figure out which of the versions I like. So it's definitely not a time saver. I do enjoy it. I make art YouTube thumbnail type stuff and, you know, make clips, video clips to help promote the thing on Twitter. And I realize I should have a UI where I can go look at what's been produced. I'm like still digging around file systems and sometimes I'm like scrolling back in the terminal to go find the links that it, you know, printed out above. And looking at your setup, I'm like, boy, was I dumb for not thinking of something a little bit more like that sooner. So This is why I wanted to do this, because I think I was pretty sure that there was going to be a few, not even technical unlocks, because I probably don't need-- there's almost nothing probably that we've talked about here that Claude can't figure out how to actually implement for me without even needing to get into the details of what you've done. But just the conceptual sort of scales fall from the eyes moments are quite useful.

Steve Newman: Building your own UI is really powerful. I feel like most of the energy is not there. You know, there's Cloud Code and there's Cowork and there's Codex and Gemini, you know, like anti-gravity. So there's all these different tools for coding and for more general agentic workflows and there's Cowork and the other tools that integrate into you, but they're all things you throw commands at. And That works really well for a big task. Go write this app or go, you know, reformat these 800 PDFs or whatever. But it doesn't work very well for tiny actions like you were just talking about, you know, go find this one file or whatever. Like, you're not going to tell... It's annoying for you to have to go and open the file directory and navigate down and find the file, but it's not really going to be any faster for you to prompt co-work or whatever to do that. So when you've got all the little tiny fine grain things that we all do every day, it's hard to get value out of agentic tools, whether they're command line or otherwise, but where you can get value is in a UI. And yeah, so I think that's sort of the other, like we're all like, we're getting, like everybody's leaning into these tools for verbs, go to an action, but you know, an app is sort of the noun side of it and that's, That's less explored.

Nathan Labenz: Chrome extension also is another one that jumps out at me as like, mine's probably different than yours, but there's definitely got to be. And I took this note from Zvi, too, but I still haven't acted on it because he's also got a Chrome extension that he uses to, I think, also collect notes and reformat and move things around.

Steve Newman: I've seen him mention that. His is probably more sophisticated than mine. Seeing the format of his newsletters, you can imagine exactly what he's doing. I'm sure that's a big time-saver. I have a bunch of little Chrome extensions. Every time I join a Google Meet, I need to switch accounts because it always opens on the wrong account. Then I always want to hide my own window, and then you need to acknowledge that your video is still being sent. There's just five buttons I have to click every time I join a meeting. I built a Chrome extension that clicks the five buttons for me and it saves me 15 seconds, three times a day. Took a couple minutes to write. It's awesome.

Nathan Labenz: So let's do that zoom out. I don't know this story too well, but your last company was acquired in 2021. Correct me if I get any of this wrong. And then you basically had created a better under the hood technology that a bigger company that had more customers wanted to use to rebuild its product for the future on the new and improved technology that you had developed, right? So then you became, in the acquiring company, responsible for actually making that happen. These projects I've never done one personally, but they're legendarily excruciating, right? Because it's like you got a zillion features. It's like I have a 100-year-old house, and I sort of feel like it's a similar thing where anytime-- I always try to use a light touch around this house because the second you peel one layer, you don't know what you're going to find underneath. And a lot of it's better off left alone. So that, I'm sure, was maybe a rewarding slog, but I'm sure it was quite a slog in many ways. How would that be different today? Would it be very different? A little different? All this is stuff that-- everything we've talked about so far, stuff we wouldn't have done before. But if you were going to go back and do that, something that you did actually do before that was a big priority, how would you expect that to play out differently now?

Steve Newman: Yeah, it's a great question. And the thing I can say most for sure is I don't know. I'm sure... me trying to talk about it now versus actual-- like, I would find more ways that it would be different if we were actually doing it. But the thing that comes to mind is that was a-- yeah, I mean, that was very much a, like, don't move too fast, please don't break anything kind of a project. So we were swapping out the data storage and query engine. from the the flagship product of this company that was getting ready to IPO if there was one thing they did not need it was disruption in the production service and um you know if they're a security company they just very briefly they had an endpoint agent that was installed on you know millions of customer laptops and servers and other computers and ingesting data from all those things and integrating with a bunch of, you know, cloud services that their customers reviewing. So pulling in massive streams of, you know, I think trillions, you know, certainly many billions, probably trillions of events per day and like needing to store those for months. and be able to query them rapidly and so forth. So, you know, serious piece of distributed systems engineering had to work at high reliability and, you know, kind of accuracy. And we were swapping the query engine out from under it. What we would not be doing today is, you know, kind of vibe coding the actual implementation of that. In a year or two, I wonder, because things are moving really fast, but not today. But a huge part of that project, and exactly to your point, was there was just so much we didn't understand. So you had two different teams. There was our team that had built the new engine, and there was the acquiring team that had built the old engine and had built the product, or rather suite of products on top of the engine. And so no one had a full picture. And actually, and there was a lot of the, you know, the existing product, the acquirer, It had accumulated a lot of sort of croft and arcane knowledge and sections, parts of the system that no one understood very well, walls that hadn't been opened up in a couple of years and whatever. And so we had a bunch of, so for example, one of the things we were being asked to do early on is estimate, come up with a budget estimate. How much should we plan? We're going public. We have to provide forward-looking financial projections. What are we gonna be spending on AWS next year after we've done this port? And we were like, I don't know how much data and how many queries. And then we needed a lot of details on that. A query that looks at one day of data is very different than a query that looks at 30 days of data. A query from a huge customer with a lot of data is very different from a complex query is different than a simple query. And we couldn't get that. It was very hard to get that information. No one really had it. Maybe there were one or two senior engineers who kind of understood that stuff, but they were really busy and couldn't take time to answer our questions. there was a lot of trying to get access to the right systems, manually poking around, looking at logs and running queries and understanding what is even going on here, what is the shape of this data, how often, what are the query patterns, how, you know, and then lots of detailed questions underneath that. It would have been so amazing to throw Claude code at this and say, here are a hundred questions I have about the data, go write a hundred tools, and run 100 sets into 100 investigations, and give me 100 reports, and then give me a distillation of those 100. And look at all of them and tell me which of the 100 reports I should probably read. And come up with a cost estimate and explain to me how you arrived at it, and let me poke holes and figure out where you've-- and none of that work touches production. If that goes wrong, it's on our responsibility to not trust it, but it's not going to take the system down. So, you know, you can much more just let Claude try or let whatever agent, I keep saying Claude, you know, whatever tool, let it try things. And so, you know, data gathering. And, you know, this is a theme I see I've seen come up, you know, a number of people talk about where there are so many things around the production system, internal tools to let you look at your own logs, look at your own data, see what's going on, look for bugs, look for patterns. that are not production critical systems, because they're just informing you, the human user, who can still apply judgment. And that would have saved us a lot of time there, and we would have done a lot more of the investigation. You know, it's just it's hard to do that stuff manually.

Nathan Labenz: How would you hire differently today if you were trying to build a software team for the Opus 4.7+ era?

Steve Newman: Yeah, I mean, this is another, I'm sure I don't know, and like we're all sort of iterating and learning very rapidly. But, you know, one high level theme, two themes I'm gonna point out. One is, I suspect what you really want now are people who can think outside the box. 'Cause there's no box anymore. You know, the box is established practice. There are a lot of engineers, you know, who've made their careers. You know, I've read the Design Patterns book. I've read the manual, like the style guideline for React or whatever. I know this is a good database design pattern, and that's a design database pattern. We have decades of industry experience telling us this. I've learned to understand when that pattern is good and when that pattern is bad. I've seen it all before, or I've read from people, studied from people who have, and I know the right way to do things, the established way, and I'm gonna do that. And that's all out the window. I mean, you can keep doing that today, but then you're not taking advantage. So we're all... I think you said something earlier in this call about how we're all figuring it out, we're all doing new things, we're all downloading, telling Claude to extract gestalt summaries of what everyone else is doing. It's all new. Oh, I should start building custom UIs. I hadn't been doing that. There's no best practice that I'm aware of. certainly know, you know, establish best practice with a history behind it of how to vibe code your own UI tools to your custom workflow. So we're all making it up as we go and all kind of idiosyncratic to no one's distilled the big patterns yet. So what makes sense for me isn't going to make sense for you because you have different tools, different needs, different situation, different skills, different preferences. And There may be some pattern that's going to underlie a lot of what we all want, some set of design principles, but no one knows them yet. And so it's -- and we're all -- people are posting things, but then they're all obsolete a week later. They're not at the sort of level of, you know, depth and quality of, you know, and universality of, you know, what's kind of emerged, gradually emerged over decades of, you know, more traditional software development. So, yeah, being able to think outside the box, being comfortable navigating without a map, I think, is really important. And then, this I'm just guessing, is that communication skills are really important because, you know, one thing, I'm not, I don't know what it feels like to be doing, you know, professional production, running a software company today, 'cause I'm not doing that right now, but what I hear from the people who are is that, What used to be a team is now a person, and what used to be three teams is now three people. And so, because everyone's running their own suite of agents, and so the amount of coordinating with other people, each person needs to do a whole team's worth of coordinating with other people. And my guess is communication skills are important, and that may also overlap with... There's some overlap with being a good communicator with another person and being a good communicator with an agent.

Nathan Labenz: Yeah, I think quite a bit, certainly in the software domain in particular.

Steve Newman: Yeah.

Nathan Labenz: So what do you think? One thing you've written about notably is the importance of threshold effects and phase changes. And it seems like we are We've passed some important thresholds if we're already at a point where what used to be a team is now a person. I'm looking back at all the tabs that you showed and it's like, that seems in some sense, it seems like bullish for infrastructure. It's bullish for GPUs, it's bullish for Anthropic, it's maybe bullish for CloudFlare. It's probably bearish for the app layer broadly, because you're, you know, one thing you, didn't see is like any SaaS app, right? In any of that is all just your own stuff. Do you think we're, have we already passed thresholds where like the sort of software engineering job apocalypse is inevitable? Or do you think there are still thresholds to come? Or maybe we're just like, somehow there's going to be so much demand for software? I have a hard time seeing that one, given how easy it is to create one's own little nest. But like, what do you, think the future of the industry looks like? And are there any kind of key moments or key unlocks that you're still looking for before you would kind of change your expectations?

Steve Newman: Yeah, great questions. I don't know. Yeah, this is one of the big questions, right? Like the number of engineers we need per line of code is plummeting. The number of lines of code is soaring. Does that add up to more jobs or fewer jobs? Today, in three months, in a year, in two years, yeah, I don't know. I do believe we are going to be building so much more software, that it is possible that Jevons Paradox is going to maintain its strong track record, and the number of coding jobs will go up, not down. The nature of the job will certainly evolve, and almost to the point where maybe some years down the road, the accurate statement will have been, Just like in the transition from horses to cars, I imagine the number of transport-related jobs increased, but they were not at all the same jobs. Maybe that basically software engineering is dead, and full-stack product manager is the giant new job market, or maybe that in some way it's different jobs, and it may not always be the same people doing them. So I don't know. I think it is very plausible, maybe even probable, that the number of jobs isn't gonna go down, at least until, you know, we may get to a point where just like all the jobs, or at least all the non-physical jobs start going away, 'cause like AI is just better at everything. Short of that, I wouldn't be surprised if there's still lots of people somehow involved in, you know, human beings involved in software development. But, you know, so what does that mean for like SaaS companies or whatever, like you didn't see me You didn't see me running any SaaS apps, partly just because I didn't bother to show that part. People have seen Slack before. So I'm still using Gmail. I'm still using Slack. I'm still using WhatsApp. But I'm mostly using them as a back-end service. I spend less time in their UI, and I care less about their feature set. And so this is going to be really, like this is another, this is a tug of where that's going on right now. And we're seeing this play out, right? Like a lot of companies are flirting with cutting off API access. You know, Slack has talked about this. You know, you know, companies don't want you using it. They don't want your agent, your claude or whatever third party agent in their app. They want you in their app, you know, Amazon restricting, shopping agents and whatever. And so there's a real tug of war here. It's in the service provider's interest, probably, to keep you in the app, because then you're more kind of locked in. You're getting more deeper value out of the app. You have more of a relationship with the app. But it's in the user's interest to be able to use the best agent for the job, whether or not that's a first-party agent or a third-party agent. And You know, I don't know how that tug of war plays out. You know, Salesforce may decide to really lock down API access to Slack, so you're like, your Claude can't talk to your Slack, my Vibe-coded app can't talk to my Slack. And if they do that, their customers may, like, roll over and spend time in the native Slack, or they may move off of Slack. And I have no idea how that's going to play out, and it's going to be differently in a lot of different domains. There is going to be, until Several revolutions from now, until things have really changed a lot, we're not going to be vibe coding our own private infrastructure. There's going to be a need for Amazon S3 and Google Spanner and the big data backbone apps, and probably the next level up from that, the Slack and the Salesforce and whatever. Maybe not Salesforce as a UI, but Salesforce as a place where data lives. Or if not Sales-- If not Salesforce, then at least certainly the, yeah, the like broad database level, like that stuff is not going to go the way of, is not going to go vibe code until everything is, until just, you know, you have a, you know, you know, Jeff Dean, you know, on the command line. And that'll be a while. I'm being very vague consumed. It's at least like, at least that's not the next shoe to drop. And then, you know, you also, you talked about thresholds.

Nathan Labenz: How close is Mythos to Jeff Dean, though? Do you have a-- Sorry? How close is Mythos to Jeff Dean?

Steve Newman: I only know, you know, what they said in the model card and whatever. I think still not very close. Actually, let me come back to that. I want to talk about threshold effects for a second, then I'll come back to that. Sure. Yeah, I don't know what, like, specific next thresholds I'm looking for. I can't guess what it's going to be, but I was thinking about this a little bit. We all talk about AI capabilities and like, oh, 4.7 just dropped, what new thing is it gonna be able to do and whatever. But we don't, all of us in our day-to-day, as we're engaging with these tools and as we're living through the impact of these tools on the world and the environment we're in, whatever, we're not really engaging with model capabilities, we're engaging with a whole complicated ecosystem of what could the model do if you prompted it well, and how well are people prompting it, and who's using it and who isn't, and what second and third and fourth order implications does that have? And I think this is where the threshold effects come from. You know, like the ChatGPT launch was a moment. Multbot was a moment. not because that was the day that the, you know, the scaling curve crossed some threshold, but because that was the contingent moment when capabilities had gotten far enough and there was a little bit of an overhang and then someone happened to do something and they happened to do it in a way that caught people's attention. And yeah, I think it's, maybe it's a little bit like there have been coronaviruses circulating in bats that like, have an R of 0.9 in human beings, like every now and then one will cross to a human being and maybe infect two or three more people and peter out. And then one day there happened to be one that had an R of 1.1 and infected a couple more people and a couple more, and then it was evolving and getting better at spreading in humans. And so that was a threshold effect where suddenly COVID exploded through the human population. but only because of the dynamic effect. It wasn't about what that virus did in one person, it was about the way it went from person to person to person. It was the forward evolving system of people in virus that had tipped over into a new domain. And that's just, and like on a more complicated level, That's what's happening with AI. You know, like a lot of people, I'm part of a big cohort of people who started Claude coding in December because 4.5 was out and we had, and it was the December break. And yeah, like a lot of this, like it wasn't just because of like a specific new capability. It was, you know, I was reading other people saying that 4.5 is worth trying.

Nathan Labenz: So. Going back to Mythos and Jeff Dean, and maybe generalizing the question a little bit, obviously we haven't, either we haven't or we're sworn to secrecy in our access to Mythos. I haven't. You may be sworn to secrecy.

Steve Newman: I haven't.

Nathan Labenz: Here's another quote that... I pulled from second thoughts. AI's impact is the product of eight separate factors, pre-training, post-training, inference compute scaling, agent scaffolding, app design, user aptitude, workflow refactoring, and adoption. All eight are advancing, some quite rapidly. They will multiply out to a that will multiply out to a blistering pace of change. It's a little silly to be speculating too much about a model that we haven't seen, but it does seem, I think the thing that stuck out to me the most that did have me thinking like, geez, I don't know, maybe it is kind of entering Jeff Dean territory, or at least could be sort of a major Jeff Dean multiplier was the Nicholas Carlini statement that he'd found more bugs in the last few weeks with Mythos than I think he said the entire rest of his, I don't know, 15, 20 year storied career combined. So maybe I was asking the wrong question because it's less a substitute and more of a complement or more of a, you know, multiplier effect. But so to juxtapose that quote, I've also kind of felt at times in other things that you've written and in some conversations that you've been like skeptical of the most singularities near kind of takes. Where are you now? Are there places where you still see reason to be meaningfully skeptical? Or are you kind of like, yeah, we're headed to Jeff Dean territory at some point. And it's just a question of exactly how many generations and how many months that may be.

Steve Newman: Yeah. And yeah, we haven't seen the model. And so I'm going to say this. I'm going to talk about why I still have some skepticism. And I want to say, before I forget, it's getting a little harder to maintain the courage of my convictions here. So the way I think about it is like people, like the tug of war of conversation is, you know, like it's either like, no, like these things like still aren't that capable, like, you know, like singularity is really, you know, like AGI and whatever is... without quibbling about definitions, is really far off. Or, are you kidding me? Mythos found vulnerabilities and everything, and Nicholas Carlini said that, look at all these amazing things they're doing. So it's like you either have to say, the models are amazing, we're basically at AGI, or the models have all these flaws, we're so far from AGI. And what I think is, the models are amazing, we're still far from AGI. The point is, And that term has gotten so useless. But so like, something that's, you know, it's Jeff Dean and it's Terence Tao and it's, you know, pick your hole, like something that's all the smart at all the things and in all the human ways, I still think we're quite a ways from. Now, as someone said, you know, I think it was Helen Toner, you know, long timelines aren't what they used to be. And, you know, quite a ways might only be like five years now, which is a remarkable thing to say, but I still think it's some distance away. that can be the depth and range of human capability. The kinds of, you know, discernment and judgment and depth of kind of pattern recognition that goes into human expertise in whatever field, I think is-- it's hard to remember how much that encompasses. And so, we see mythos that can just, is an absolute beast at identifying certain categories of security flaws and, ooh, big new, like big scary new step, actually piecing together working exploits for many of them. That's really impressive. Score a, put, you know, 300 points on the board for models. But I think this, you know, all the smart at all the things is like 50,000 points. It's just, you know, we forget how far off that still is. In part because, you know, all the easily-- you know, you ask it a question, it has an answer. Like, it's hard to sort of dig down deep enough into the model now to get to the point of lack of capability. You're not going to get there in a chat session. You're only going to get there in some really serious work. And I think there's a little bit of a blind spot. We don't ask it to do the things it can't do because it can't do them. And so we don't see people talking about it do them. I find it harder and harder to articulate what I mean, but I just have a strong sense that there are whole categories of things that AI still really can't do that we just don't even think about when we think about AI. And a lot of it has to do with context, like AI couldn't have a dinner conversation with my wife for me. Obviously, for a hundred reasons. And some of that is just sort of like dumb reasons, like it doesn't know the history. But I think therefore we don't also think about like all the really subtle capabilities that it's probably missing. That was kind of a silly example. But again, I find it frustratingly hard to come up with good examples of what I'm talking about, which makes me worry a little bit that I'm full of it when I say this. But that is my gut.

Nathan Labenz: I find it harder and harder to come up with reasons not to think some sort of singularity is near... Obviously, the physical world is lagging. So, I mean, that's like one massive category. Although, you know, you look at some of these, it's harder to evaluate. We don't typically get our hands on the actual humanoids in the same way, but some of the videos are starting to look pretty impressive. I think they're probably still... you know, a ways away from coming in and doing plumbing in my 100-year-old house, but they can handle rough terrain anyway at this point. That's pretty clear.

Steve Newman: You know, I'll tell you a few specific, some questions I have, because yeah, it's, yeah, again, it's getting harder and harder to confidently say that we're not on the cusp of some kind of recursive self-improvement takeoff. Three specific things I think about, let's see if I can remember all three. The first is, clearly, You know, the models are roaring through software engineering and into, you know, kind of other, you know, like AI R&D, like experiment design and so forth. One thing I just don't understand is how deep is that rabbit hole? I would love for someone who really knows, you know, there's probably about a thousand people in the world who could do a good job of this, but most of them are in a position where they can't do it. What is, and maybe it's even a smaller number, what really goes into making the models better? Okay, we know there's a lot of just coding. We know there's like thinking about like ways to tweak the learning algorithm or the data curation or whatever. We know there's like designing RL environments, but what really do you need to know? Like what's the set of skills to build an RL environment? Like what's the difference between a junior developer And someone who's been building RL environments since the beginning of the project that became O1 and has several years of experience at it, what taste and judgment and discernment of what makes a useful RL environment that's really going to push the model's capabilities? What kind of subtle Judgment do you need there? Is there just some other big aspect to it, like knowing how to manage? Is it still important to have the 10,000 human experts weighing in on a bunch of different subjects? And is there some very high-level skill of what questions to ask the human experts and how to manage that? And how much is feedback from people using the models? And what are the high-level skills in making that happen? What is the list of capabilities that have to be checked off? for the models to really automate their own improvement. And are there sections of that that don't look very much like continuing to get better at coding? I don't have a feel for that. And I wish I did. You know, I wish we knew more about what are the higher level and more obscure corners of making models better. So that's one question. Are we You could tell a story that we're within a year of automating AI R&D. You could tell a story where somewhere in that ramble I just went through, there's some pieces that are gonna take significantly longer. I don't know which. Then once you're like, supposing that hill gets climbed. So now we're gonna have agents that are like superhuman at coding, superhuman at math, superhuman at all the like sort of easy, you know, objectively gradable, whatever stuff. How easily does that generalize to, you know, superhuman at marketing and business strategy and, you know, you know, managing a team and teaching a class and, you know, and just like all like a lot of softer skills and fuzzier skills. And I don't know, you know, product design and engine, you know, physical and, you know, mechanical engineering and just like all these other things that aren't. physical world tasks that bleed into the physical world or the social world? Like how easily does that general, like when we automate AI R&D, are we then gonna like slam to the top of all those other skills? Or is there some like AI as normal technology factor that's gonna get in the way? And then the third is, you talk about robotics. And I hear a lot about like, don't get too impressed by the videos because it's a very big gap between a robot that can be scripted to do a predictable thing once versus a robot that can incorporate tactile feedback and have real-time reflexes and whatever and deal with the messy task over and over and over again in its messiness. But so does that mean we've got another Rodney Brooks 30 years or 50 years of whatever or, and I'm not sure I'm characterizing him accurately, but do we, Rodney Brooks accurately, but do we still have a long, long, long way to go? Or is the super intelligence that we were just speculating about gonna then slam through the rest of those tasks as well?

Nathan Labenz: Yeah, certainly in robotics, a 50% success rate on tasks will not cut it in one's home, nor will an 80%. But we do see the self-driving working at a level that is now, the thing that I cite to my dad, who's a skeptic of self-driving, is when the insurance companies start offering you discounts for using it, now you kind of have to believe the hype, right? I mean, they are, they're very, very incentivized to, to be very skeptical about any and all such.

Steve Newman: I'm a big believer. Yeah, I feel like Waymo at least has arrived. I'm less clear about the other providers, but certainly Waymo has. And that's, you know, that's a very solid existence proof. It's clearly safer. There's edge cases where it maybe isn't, but On balance, I would much rather Waymo be driving than me driving, and I would much rather Waymo be driving than the people around me. But again, so that's a question. So that took decades and decades. Over and over and over again, there were overly dramatically over-optimistic predictions. We're still not all the way there. Waymo is still not in public service in the rain, I mean in the snow, and there's all kinds of edge cases that haven't been crossed yet. And so one story is, okay, so we've got another 30 years to go on robotics, if you go by that example or something, or I don't know, 30, but a long time. The other story would be that, and by the way, that's still a fairly limited domain, and one where, okay, I give up, I'm gonna pull over and stop, is a button you're allowed to push. You can push that button more than once. So in some ways, it's still relatively controlled domain, limited domain, like there's only, about two degrees of freedom on a car, fast, slow, left, right, that's the number of degrees of freedom in one finger. So you could tell a story that that means robotics is hard, or the other story would be, yeah, robotics is hard, but we're going to have superintelligence and it's going to plow through the heart. And I can't rule that out, but I don't feel like it's demonstrated.

Nathan Labenz: Yeah, okay. People should subscribe to Second Thoughts for more of your evolving thoughts as we go. aside, just to check in on, is the relationship between AI and climate. I actually just learned in preparing for this conversation that prior to focusing your sense-making abilities on the AI space that you were focusing on trying to make sense of the climate questions. And one of the posts from the climate era of your writing was basically saying, it seems like AI is not going to be a big deal for emissions. And I did a Andy Masley episode, basically trying to make that argument from a bunch of different angles. Have you seen anything-- this could be a very simple, yep, no big change answer. But has there been any change to your worldview about the intersection of AI and climate?

Steve Newman: So I don't follow this as closely as I used to. My sense is I was a little bit wrong. because I didn't anticipate just how rapidly basically data centers were going to scale. So and I was over and I think I overindexed. Yeah, like two things have have changed for me. One is just electricity usage from AI has increased substantially and seems poised to really get, you know, like we're it's an exponential and, you know, every year the exponential is much more dramatic than the year before. And the other thing I did not see coming is in the sort of land rush to get more gigawatt capacity for data centers, the hyperscalers are, to my understanding, I'm not following this closely, are really backing off on their climate commitments. There was a lot of really good talk and action, like Microsoft, Google, a lot of the hyperscalers were doing really good things for climate. They were early purchasers of various forms of clean power, of real hard offsets. And they-- again, to my understanding, I'm a little fuzzy on this. Certainly, you have examples like XAI opening the Colossus data center and just trucking in the quickest, least efficient gas turbines they could find. And I think the hyperscalers are also doing that as well. They're building, you know, getting whatever power they can, even if it's gas or something. I think certainly it seems like there's some of that going on. So those add up to AI is using a lot of power, a lot of it isn't clean power. That's not good for the climate. I still think that's even from a climate perspective, I don't feel like that's the main story. The main story is AI is going to reshape the world and It's going to be the broader effect. It's either going to lead to advances in material science and other things. And we're just going to like-- we're going to really solve batteries. And we're going to find more-- and we're going to find ways of turning all of our messy petrochemical-based chemical processes into cleaner electrochemical processes. And if AI increases emissions from electricity production by 20% globally, which would be huge, right? You could also see in that world where it's making the overall planetary industrial base 20% more efficient, like micro-targeting, like robotic agriculture, using robots to kill insects instead of pesticides and more carefully targeting fertilizer and all these things. So it'd be easy to see the reduction in emissions from the rest of the economy outweighing the emissions from generating electricity, especially because, you know, my sense is in the long run, solar primarily and various other clean technologies really are like, we're not going to be building a terawatt of gas generation capacity for the terawatt of someday of data centers. Like that's going to be cleaner. Like the technologies that are going to make economic sense at scale are going to be clean, still seems to me. So in the big picture in the long run, I feel like it's going to be probably good for climate, but also just sort of generally like it's going to completely, AI is going to roll the dice on the whole world and climate is going to be along for that ride, whatever direction it goes. In the short run, we may be burning more fossil fuels to power more data centers.

Nathan Labenz: Cool. Thank you. Great answer. And that note on rolling the dice with the whole world is maybe a good segue to the last section I had for you, which is basically What's going on with the Golden Gate Institute? I've had the good fortune of being able to attend the Curve the first two times that it's run. That's been a well-loved and much discussed event, but it's certainly not the totality of what you guys are up to. So tell us kind of what the mission is and the range of activities and how people can support or find ways to get involved or get on the Curve wait list or whatever the case may be.

Steve Newman: Yeah, so the Golden Gate Institute for AI is a non-profit I co-founded last year. And basically our mission is, as you said, as we get ready to reroll the dice on the whole world, as AI you know, is moving forward so rapidly and having impacts and promising further impacts. And we're all having to navigate this, you know, individuals, consumers, you know, business leaders, policymakers, civic organizations, you know, civil society, everyone either has or is going to have some role to play in how AI plays out and certainly is going to have to be preparing for and reacting to and steering, you know, navigating through a lot of changes, direct and indirect. So there's a lot we all have to collectively figure out, and it's really hard. You know, you and I just had a two-hour conversation about, you know, how much trouble we're having keeping up and predicting what's going on, and that's sort of both of our jobs. It's really hard to make sense of all this. And so our mission at Golden Gate is to try to contribute to that collective sense-making project. And the main way we do that is, or the problem we're trying to address is that there are so many different pockets of knowledge and expertise and viewpoint. You've got, you know, understanding what's happening with AI is a computer science and machine learning question. It's an economics question. It's a cybersecurity question. It's a biosecurity question. It's a labor market question. It's a political question. It's an education question. And no one person, no one group has all of the even information, let alone or perspective, let alone all the answers. And There are all these isolated pockets of people in a particular field, or people in San Francisco versus people in Washington, D.C., or people on the political left and people on the political right. All these groups sort of figuring it out on their own and not engaging necessarily with the other groups. And so our mission is to bridge those gaps. And we do that through publications, like my blog, Second Thoughts. But a lot of it we do through getting people together in person. And you mentioned the curve, which is our kind of flagship activity. It's a conference we've been running annually where we get together about 350 people from every one of those communities, you know, just from like every, you know, walk of life and walk of certainly the broader AI cinematic multiverse. And and we've found that, you know, when you get people together face to face, it It's trite, but it's true. It really changes things. Like people who have been yelling at each other on Twitter, or more often just ignoring each other, they'll have a conversation. They'll find things they have in common. They'll remember that the other person is a human being with reasons for their viewpoints or whatever. And we really see things coming out of this. Coming out of the conference, we've seen projects come together and new working relationships, but also just more engagement with-- just kind of breaking down some of these barriers between the groups. So, you know, what's helpful, I should have a better answer to that, you know, to mind, but, you know, we're gonna, we're about to announce the date for our, for this year's conference. It'll be, spoiler alert, it'll be October 2nd through 4th. And so, yeah, like, go to our website or subscribe to the blog, Second Thoughts. There'll be an announcement of that. And, you know, read the blog, and if you're interested, and we're always looking for engagement, people to comment or reach out. A big part of what we're able to do is a function of our network. The more people we're connected with, and the more directions we're connected. So for example, it's been hard for us to really connect with some of the communities outside of the U.S., first and foremost in China, but also every other part of the world. We'd like to build our network more there. We're not as connected as we'd like to be in the robotics industry, just for another example. So if you're just interested in what you're doing, get in touch, apply to the conference if you're interested. And we're gonna... We're very small, but we're expanding. We've been growing the team. We're running the curve again this year. We're hoping to run it twice next year. A year is a long time in AI. My colleague Taryn was joking that we probably need to double the number of curves every year going forward. And so, yeah, we're going to be doing more. There's going to be more opportunities to get engaged. So just first and foremost, if you follow us, then we'll be talking about these things.

Nathan Labenz: Cool. This has been excellent. I'm looking forward already to adding many enhancements to my personal productivity setup based on all your examples. Anything that I should ask or anything you would wanna leave people with before we break?

Steve Newman: No, that was great. I was just gonna say, this has been a ton of fun and it's, yeah, I'm just gonna say, I love the podcast and it makes, it just, You know, it's been really fun to get to be on this side of the microphone because, you know, you ask really great questions and these are, you know, you've got me thinking about a lot of things. So, you know, thank you.

Nathan Labenz: Thank you very much. That's very kind. The blog is Second Thoughts. Subscribe. Watch out for the curve coming with apparently doubling frequency going forward. Steve Newman, thank you for being part of the Cognitive Revolution.


Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to The Cognitive Revolution.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.