The Perfect Substrate for AGI, with Replit CEO Amjad Masad

The Perfect Substrate for AGI, with Replit CEO Amjad Masad

In this episode of The Cognitive Revolution, Amjad Masad , founder and CEO of Replit, discusses the fast-paced growth of Replit, the concept of 'vibe coding', and the challenges and opportunities in the rapidly evolving AI-assisted coding space.


Watch Episode Here


Read Episode Description

In this episode of The Cognitive Revolution, Amjad Masad , founder and CEO of Replit, discusses the fast-paced growth of Replit, the concept of 'vibe coding', and the challenges and opportunities in the rapidly evolving AI-assisted coding space. He shares insights into Replit's competitive advantages, known as 'moats', and how they maintain their lead through innovation and excellence in execution. The conversation delves into the architectural complexities of building AI agents, their autonomy, and the practical implications for businesses. Amjad touches upon the potential and limitations of current AI models, the future of general intelligence, and the preparation needed for potential AI advancements. He also highlights the importance of intuitive user interfaces for AI-driven software development and the critical role of human-like interaction with AI agents.

Upcoming Major AI Events Featuring Nathan Labenz as a Keynote Speaker:
https://www.imagineai.live/
https://adapta.org/adapta-summ...
https://itrevolution.com/produ...

SPONSORS:
ElevenLabs: ElevenLabs gives your app a natural voice. Pick from 5,000+ voices in 31 languages, or clone your own, and launch lifelike agents for support, scheduling, learning, and games. Full server and client SDKs, dynamic tools, and monitoring keep you in control. Start free at https://elevenlabs.io/cognitiv...

Oracle Cloud Infrastructure (OCI): Oracle Cloud Infrastructure offers next-generation cloud solutions that cut costs and boost performance. With OCI, you can run AI projects and applications faster and more securely for less. New U.S. customers can save 50% on compute, 70% on storage, and 80% on networking by switching to OCI before May 31, 2024. See if you qualify at https://oracle.com/cognitive

The AGNTCY: The AGNTCY is an open-source collective dedicated to building the Internet of Agents, enabling AI agents to communicate and collaborate seamlessly across frameworks. Join a community of engineers focused on high-quality multi-agent software and support the initiative at https://agntcy.org/?utm_campai...

Shopify: Shopify powers millions of businesses worldwide, handling 10% of U.S. e-commerce. With hundreds of templates, AI tools for product descriptions, and seamless marketing campaign creation, it's like having a design studio and marketing team in one. Start your $1/month trial today at https://shopify.com/cognitive

NetSuite: Over 41,000 businesses trust NetSuite by Oracle, the #1 cloud ERP, to future-proof their operations. With a unified platform for accounting, financial management, inventory, and HR, NetSuite provides real-time insights and forecasting to help you make quick, informed decisions. Whether you're earning millions or hundreds of millions, NetSuite empowers you to tackle challenges and seize opportunities. Download the free CFO's guide to AI and machine learning at https://netsuite.com/cognitive


PRODUCED BY:
https://aipodcast.ing

CHAPTERS:
(00:00) About the Episode
(05:39) Introduction and Welcome
(06:13) Replit's Rapid Growth and Impact
(07:21) Understanding Competitive Moats
(10:18) AI Coding Assistants and Market Dynamics
(11:27) Replit's Unique Approach and Infrastructure
(13:45) Model Training and Evaluation (Part 1)
(18:01) Sponsors: ElevenLabs | Oracle Cloud Infrastructure (OCI)
(20:28) Model Training and Evaluation (Part 2)
(27:12) Agent Architecture and Future Directions (Part 1)
(34:18) Sponsors: The AGNTCY | Shopify | NetSuite
(38:40) Agent Architecture and Future Directions (Part 2)
(39:35) Challenges in AI Software Development
(40:03) The Ideal User Interface for AI Software
(41:04) Treating AI Software Creators as Developers
(42:43) The Importance of Power Tools in AI Development
(43:19) Visualizing AI Agent Activities
(47:00) The Future of Routine Work Automation
(48:29) The Debate on Learning to Code
(51:23) Preparing for AGI and AI Safety Concerns
(58:38) The Role of Data in AI Progress
(01:00:37) Closing Thoughts and Future Prospects
(01:01:30) Outro

SOCIAL LINKS:
Website: https://www.cognitiverevolutio...
Twitter (Podcast): https://x.com/cogrev_podcast
Twitter (Nathan): https://x.com/labenz
LinkedIn: https://linkedin.com/in/nathan...
Youtube: https://youtube.com/@Cognitive...
Apple: https://podcasts.apple.com/de/...
Spotify: https://open.spotify.com/show/...


Full Transcript

Transcript

This transcript is automatically generated; we strive for accuracy, but errors in wording or speaker identification may occur. Please verify key details when needed.


Introduction

Hello, and welcome back to the Cognitive Revolution. Today, I am super excited to be speaking with Amjad Masad, founder and CEO of Replit. Longtime listeners will know that I've been using and recommending Replit since back when Amjad still thought humans should learn to code.

I have always loved how quick and easy they make it to start developing a new project, and their mission of bringing the next 1,000,000,000 developers online made them natural early adopters of large language model coding assistance too. At one point, they even worked with Mosaic ML, another past guest, to train their own code model. These days, with their new AI agent available to help users of any skill level and simple but solid Cursor and VS Code support for when you need to call in a professional human developer, plus the ability to deploy to the web in just a few minutes, I continue to believe that Replit is the strongest overall vibe coding platform on the market today. And that's no shade, by the way, to any other platform. A number of platforms available today would have blown everyone's minds less than a year ago, and certainly everyone I've had on the show is doing excellent work.

All that said, what really got me watching Replit closely and sets the stage for what makes this conversation so interesting was when Amjad started describing the Replit platform, at least as far back as early 2023, as, quote, "the perfect substrate for AGI." This vision always rang true to me because I truly can't think of a better natural habitat for an early disembodied AI than the super accessible, highly configurable, and very scalable computing platform that Replit operates.

And indeed, as you'll hear, there is a lot to learn from Amjad and the Replit team's experience developing both an interactive AI coding assistant and a more autonomous AI coding agent. One major takeaway is that the transition to large scale reinforcement learning has not been without practical trade offs. Fascinatingly, while Claude 3.7 powers the agent and it can definitely do amazing things - just the other day, for example, it coded a web app for me from scratch up to 30 some files, turning what would have been days of work into just a few hours project - the interactive assistant, meanwhile, is still powered by Claude 3.6. Why is that? Because as Amjad describes it, 3.7 is just fundamentally more agentic and harder to control in a simpler turn based mode of interaction.

Now I won't spoil Amjad's story of how Claude 3.7 circumvented some of the early control measures that Replit put in place for their AI coding agent. But as you listen to him tell it, I would call your attention to how his direct experience with such colorful bad behavior from AI models does seem to be creating some potential common ground between the AI safety and the accelerationist perspectives. Amjad has described himself as an accelerationist for years and continues to do so in this conversation. But at this point, the security expertise that Replit developed while battling crypto miners and other human scammers is becoming directly relevant to their work with AI. And Amjad, as you'll hear, is increasingly open to the possibility that a cybersecurity Move 37 could happen in the not too distant future.

This sort of recognition that the capabilities curves don't yet seem to be bending and that more and more of the bad behavior that the AI safety community has been worrying about for years is in fact starting to happen is actually popping up in more and more places. Recently, I was struck to see that even the pseudonymous BasedLord, who no less than Marc Andreessen had anointed a patron saint of accelerationism, said of OpenAI's latest model, quote, "o3 is smart, but keep an eye on that motherfucker." From my perspective, if even BasedLord feels compelled to call out AI bad behavior, there's clearly hope of mutual understanding and collaborative problem solving going forward.

And by the way, speaking of Marc Andreessen, as you might have heard, the Turpentine Podcast Network, which the Cognitive Revolution belongs to, was recently acquired by a16z with Eric joining the firm as a partner. Turpentine has been an excellent partner for this show, allowing me to do so much more than I ever could have done alone while also contractually enshrining and practically supporting my editorial independence. So I am genuinely excited to continue the work under the new arrangement. I've had a number of great conversations with a16z partners over the last couple years, including my fellow AI scouts, Justine and Olivia Moore, and I absolutely look forward to continuing to bring a wide range of perspectives together to understand the latest AI developments on their own terms and to reckon together with the implications and discuss what we should do about them in good faith. This conversation with Amjad, my recent one with New York assembly member Alex Boris, and the series of episodes we did last year on SB 1047 are all good examples of this. And you can look forward to guests who represent the official a16z perspective on AI policy coming up soon as well.

As always, if you're finding value in the show, we'd appreciate it if you take a moment to share it with friends, leave a review on Apple Podcasts or Spotify, or just leave us a comment on YouTube. Of course, I always welcome your feedback as well either via our website, cognitiverevolution.ai, or by DMing me on your favorite social network.

Finally, for now, a quick reminder, I'll be speaking at Imagine AI Live coming up next week, May in Las Vegas. Tickets are still available for now. Also, the Adaptive Summit, August in Sao Paulo, Brazil. And finally, the Enterprise Tech Leadership Summit, September, back again in Las Vegas. If you'll be at any of these events, definitely send me a note, and let's meet up in person.

For now, I hope you enjoy this conversation about building and controlling AI coding agents, which I find to be both techno optimistic and also sociopolitically optimistic, with Amjad Masad, founder and CEO of Replit.


Main Episode

5:39 Nathan Labenz: Amjad Masad, founder and CEO at Replit. Welcome to the Cognitive Revolution.

5:43 Amjad Masad: Thank you.

5:44 Nathan Labenz: So I'm gonna talk as little as possible. I wanna give you as much airtime as possible and just catch up on everything that's going on at Replit and how you're thinking about the rapidly evolving vibe coding landscape. I don't know if you like that term or not, but we have recently done episodes with the CEOs of Bolt and Lovable. And, obviously, companies in the AI assisted coding space broadly are going vertical. I wonder if you would maybe wanna start by just giving us an update on just how vertical Replit is going right now.

6:14 Amjad Masad: Yeah. We're growing really fast. I think sort of uncomfortably fast in some cases. And I don't pay as much attention to that as the more like the impact that we're having because there is a lot of noise. You know? There's a lot of people trying things and jumping between things. But are we having actual business impact? Are people's lives been changing? I watch the anecdotal more than the data itself, although everything is going well in the business. But just seeing people make life changing money, seeing people who had ideas locked in their heads for, in some cases, decades, being able to finally build them, and seeing businesses become more efficient, change how they work, empower employees who otherwise wouldn't have been able to contribute meaningfully. So all these things get me fired up, and those give me a better picture of the business and the long term viability than merely going vertical.

7:18 Nathan Labenz: It's often said in the AI era that there are no moats. I'm on record as a defender of the existence of at least some moats. How do you think about Replit's moats today?

7:30 Amjad Masad: Well, I think Silicon Valley just uses it as a sort of stand in word for something like you have some competitive advantage. They don't tell you whether it's lasting or not. Like, people say, oh, your platform is a moat. What does that mean? So there are a few ways to create lasting competitive advantage. By the way, there's ways to create near term competitive advantage, and that's being excellent at execution, having insightful ideas, having a really great talented team. All these things contribute to, at minimum, a short term lead. Now we can keep that short term lead as long as you keep the intensity high, you keep the culture good, and all that stuff. You can also have a lead in terms of how far you built in terms of your system, platform, engineering ecosystem. So at Replit, we have 7, 8 years of building the platform that has helped a lot to move in this direction, have what I think is a differentiated product.

And now in terms of real moats, and this is the moat that will last 30 years that will defend against the onslaught of even the biggest companies with the deepest pockets - those are hard to do. And there's been a lot of academic work on what it means to create technical and otherwise market advantage. The most interesting book in this space called 7 Powers by Hamilton Helmer, he basically makes the observation that if you look at the best companies in the world, those who are lasting companies who can grow compound over time with little competition, they tend to have one of 7, you might call them moats, but he calls them powers. And those are things like switching costs, companies who get a lot of adoption, but it's hard to switch out of, like, say, Oracle. Oracle has been sort of trading on its early success and continuously sort of extracting more value from the installed base. Microsoft is sort of like the same. They have this larger installed base, and they can continuously sell them more stuff because they're deeply embedded, and there's really high switching costs. Or network effects, everyone knows about this one. It's hard to build. People build a little social network and then convince themselves that they have network effects. More important than network effects is network economies, meaning that it's not only like some subjective value that you get out of the other participants of the network, but it's also a real economic value that's being generated by participants in the network.

So based on a more strict definition of moats, I would say that the moats are hard to see right now because it's so early and the market is so dynamic. The big question has been, does the model layer have moats? And people have gone back and forth. I think the consensus is like, no. You might have a lead. You might have a 2, 3, 4 months lead, but you need to continue innovating. The shelf life of models is low, and you need to continuously pour money into it. We can watch also how the model companies are acting and how they continuously feel like they have to move upwards and downwards, meaning OpenAI going to the application layer, but also going into the hardware layer, going down to a starter base. So clearly, based on their behavior, don't really think that there's much defensibility there.

On the app layer, I think it goes back to more traditional kind of questions. So the coding space, are you actually building something that companies will depend on deeply and build roots into such that it makes it harder to switch? Is what you're building very easy to compete with? Like, now we have Figma, Canva, and others building similar products to Lovable and Bolt to where you give it a Figma design, generates a UI. It seems like it's gonna get really competitive there. They all do the same things. They connect to Supabase and what have you. Replit's approach has been to build on our 7 years of infrastructure build out that is both the runtime environment that we use to run the agents to be able to spin up virtual machines, install packages, change the operating system even, configure databases and object storage, and even have access to underlying facets and primitives such as fork, rollback, and then being able to access the deployment environment that we also built from scratch. So I guess that in summary, it is not entirely clear where the moats are. I think it's starting to shape up, but I would watch the kind of things that I just talked about.

12:08 Nathan Labenz: Yeah. I think I see it pretty much the same way, obviously, from a much farther distance, but it's been striking to me that I've seen this pattern in a bunch of different spaces where people have been in many cases, and you guys were on a very attractive curve already even before AI coding assistance became viable. But I've seen many companies that were sort of trying to solve some hard problem, not really cracking into the adoption that they wanted. I would put my own company Waymark in that camp. And then the layering on of the AI assistant totally transformed the accessibility of the product, totally transformed what we could do in terms of distribution and the value that we were creating. But then it actually turned out that while we were right to put all our energy into that for a while, once multiple players have it, and this is happening probably in the most visible form in the AI coding assistant space, then it's like, now we're back to all those other things being the things that really matter. Like, do you actually have the ability to install the full range of packages? Do you actually have some path to deployment that makes sense? And so as much as I have enjoyed and do continue to visit all of the vibe coding apps and sometimes it's like spin them all up at once and see who gives me the best UI back. For me, what keeps me coming back to Replit as the anchor vibe coding platform, if you will, for me is that, first of all, I can do a full stack thing. I don't have to have my laptop open, and I can actually graduate to a deployment if that happens. And that stuff is the hard work to build, and the models themselves are not quite yet to the point where they're gonna do that. Let's talk about the models. At one point, you guys trained your own working with Mosaic. Now you have Claude 3.5 powering your agent.

13:55 Amjad Masad: We've upgraded.

13:56 Nathan Labenz: Oh, you have upgraded? Yeah. Okay. I wrote these questions like 3 weeks ago, and at that time, it was still...

14:03 Amjad Masad: Well, the assistant is 3.5. If you look at the agents, if you start a new project, the agent is the default, it is 3.7. So the difference between assistant and agent - assistant is more like the request response, very similar like Cursor Composer. Agent, which was the first breakthrough agent in the space, works indefinitely and has the autonomy and freedom to be able to take multiple steps and look into different files and aggregate some context before making an edit. So that's the main difference between them.

3.7 was really hard to make work for Assistant, and you've seen that with Cursor and other companies. 3.7 was trained to be more autonomous, to work better in an environment where it has more freedom. And it worked really well with agent because we were starting to work on what we call agent v2. And v2 was about actually removing some of the rails we had around the model and giving the model more time to work. So we reduced the number of steps where we think the model is going crazy, going to death loop. And we also added a reasoning step just to make sure that the model is not going off the rails, but we would let it work for 5, 10, 15 minutes at a time. And 3.7 is really good at that. It is very curious. The reason why a lot of engineers don't like it is because it wants to be the engineer. It doesn't want you to make decisions. It wants to be you in a way. So it works for Replit because, again, we're more on the vibe coding side, and people don't need as much control as other sort of users. So, yeah, we love 3.7.

15:49 Nathan Labenz: Interesting. How do you think about when - another thing I've noticed and used is that you can now, just with a click, open up your VS Code or open up your Cursor and then SSH into your REPL or your app now we call it. Right? And code from your IDE. How do you think about that? Like, was that a partial win or a partial loss, or do you not care whether people fire up Cursor to code on top of Replit?

16:14 Amjad Masad: 95% plus of people gonna do that. So the way we think about it, and it just reminds me of my sort of open source days and when we were working on React JS at Facebook, when we were trying to gain adoption, we had this thing called escape hatches. So when React doesn't do the right thing, you might wanna go into jQuery. And so we had certain hooks that allowed you to go into the underlying library or whatever you're using. And that made sense when React was the least dominant player in the space and at some point stopped making sense.

In a similar way when we added VS Code and Cursor and whatever, Replit was still sort of not growing as fast. Our AI was not in a good place, to be honest. As we were investing in Replit Agent, our AI features rotted a little bit and degraded, and we were just so focused on building Replit Agent for almost entirely of 2024. And so in some cases, we felt like it was a good idea to give people an escape hatch. I think it's useful in that if you're part of the team and you're the vibe coder and you build something in Replit, but you want an engineer to come look into it and work on it and use it, then that's a good feature to have. But, honestly, it's neither here or there. I don't really think about it too much. I like that I don't want people to leave Replit. So if it means that if you really need to bring up Cursor or bring in someone who uses Cursor, I think it's better if you stay on Replit and use Cursor.

17:56 Nathan Labenz: Hey. We'll continue our interview in a moment after a word from our sponsors.

[Ad break]

20:24 Nathan Labenz: How do you evaluate things in today's world? And I guess, you mentioned 3.7 is hard to make work as an assistant. How rigorous are you in that assessment, or is that a more of a vibe check type of dynamic?

20:42 Amjad Masad: The gold standard is an AB test. You can learn a lot from an AB test, and especially if you have good AB testing framework, you have good KPIs that you care about. And so it's the most revealing thing that you can do. And we've been doing it since we sort of started working on our model. So I would say that's the most important thing. I guess this is what any company with any significant amount of users and infrastructure do. Like, I'm sure ChatGPT is AB testing all the time.

In terms of evals, yeah, we have - like you said, we were training our own models, and the research team that we have is more focused on making evals as many research teams are focused on right now. So we do have a lot of effort going into making evals. We have some basic evals. We run the basic ones like SWE-bench and things like that. But we're building more things that are Replit specific. So it's never as much as you'd like. It still feels inadequate and somewhat incomplete, but we're fairly quick at evaluating new models. Like, we looked recently at Gemini 2.5. And while it's impressive in all sorts of ways, it is actually not as good at doing long trajectories, long agent trajectories.

So that's what you want. You wanna be in a position where you're quickly evaluating models as they come out. But, Nathan, the hard thing I would say is not evaluating models with the current capabilities. It is figuring what can models do that are new capabilities that are better than the previous version. And this is more of an art than science. And I believe I sort of coined the "vibe test" back when we were training our own model. But you wanna sit with the model and figure out what is it able to do.

And like with agent v2, we kind of knew that the direction that the labs were going and how they're optimizing their models was gonna be towards more autonomy. So for example, with v2, we are actually way less descriptive in terms of how the agent should work, and we actually try to give it less context. With v1, we would spend a lot of time and effort on RAG and things like that. With v2, we're trying to get the model to decide where it looks. It's actually very good at grabbing. That might be better than ragging and putting things in context because letting it follow its path, it's gonna put it more in distribution, I guess, is one way to say it.

23:34 Nathan Labenz: Yeah. Let it play to its own strengths in a sense. That has been striking. I've been looking at a lot of open source agents that different companies have put out recently, and for the most part, they are amazingly simple. Like the OpenAI Codex, it's really just like one thing. It's one prompt and the thing has command line access, and that's pretty much it. There's not really a lot to it. It really is the model.

Now, of course, I actually don't know to what degree they're using that internally and actually getting value from it. A lot of these simple frameworks that I see put out, there's a SWE-bench score attached, it's in the sixties or whatever. It's like, okay. That's really interesting. It's a remarkable flex of the model. I would assume that people do more to actually try to make it work in a specific environment. But it sounds like you're saying, no. Not really. It really is a pretty minimalist setup, and the model itself is delivering the vast majority of the value. And you don't even really have to work that hard to assemble context for it if I understand you correctly because it is good at poking around on its own.

24:42 Amjad Masad: I think that's overstating it. I think that you would want to provide a habitat for the model that is as easy and as optimized as possible in the same way you'd wanna do that for a developer. So if you put the model in an environment where it can install native dependencies like you could do in Replit versus on your MacBook where you can't do apt-get or something like that, it will perform worse. Right? So the tooling that is available and the system is good is important. The resiliency, reliability, all of that stuff continues to be important.

So but instead of building the system in this semi deterministic way, which is how agents were built in 2023 and 2024, where it is based on prebuilt state machine-like programming model. I think state machines are useful, but you don't wanna make them too fine grained. But you would want great tools. You would want to give the model really great tools. If errors happen, you wanna give it great errors with as much as explicit as possible. You'd want it to have as much observability of the environment as possible. It should be able to get errors from the browser, which is really hard to do with OpenAI Codex or Anthropic Claude Code.

So it ends up being this interesting mix of, you need to build very stable, reliable infrastructure, and you want sources of context that are very rich and very timely. But at the same time, you would want to put the agent in a VM and let it use the tools it knows best. So it does matter how much the environment matters. I guess more than before in a weird way because the agent knows how to use these environments. And so the more robust the environment, the better the agent can perform.

27:09 Nathan Labenz: What would you say are maybe some of the most important or interesting or surprising additional lessons you've learned about agent architecture? I mean, we've got all these sort of proto paradigms in the field right now, whether it's MCP or I just did an episode not long ago with Andrew Lee from Shortwave who basically said, we just use Claude, we let Claude have these long running episodes, and we basically don't get in its way, and that's where we get the best results. But then you have the OpenAI agents SDK where they have the paradigm of handoffs, and Harrison from LangChain has also been forward recently with, you wanna have sort of a front end triage agent and then specialist agents that sit behind that. Don't know how much you wanna tell us about the underpinnings of the Replit agent, but where do you come down on some of these questions that right now I don't think there's any consensus on in the field?

28:03 Amjad Masad: What is the agents SDK? Is that the Google one? Or...

28:05 Nathan Labenz: That's OpenAI's where they have the handoff primitive, where the goal is to be, I think, easier to test. Right? If you have a customer service agent versus whatever other kind of agent, you can just test it just in the customer service setting because it's gonna be handed off to for that role, and then it will hand off back when it's done with whatever its job is.

28:29 Amjad Masad: Yeah. I think it's important to decouple agent communication, like MCP and maybe SDK versus agent architecture. Yeah, they're related, but you still have to build a coherent single agent to do a certain specified task before we figure out how to decentralize and have it communicate, which is a fascinating thing. MCP is really interesting, but it doesn't have much bearing on how you architect the core agent.

Again, like we said earlier, it is easy to overarchitect it because that's how we did it in '23 and '24. I think what Harrison and LangChain have done with LangGraph is a good first approximation of how these things should work. It needs to be some kind of node, graph based structure. You can think about it as a state machine where the agent is going from searching or collecting context to editing to reacting to running, goes back to observing or reacting, and then finally goes back to searching. And that will help you provide the right tools. It'll help you debug. And also, if you did it correctly, it'll improve the reliability because you need to do it in a way where when the underlying machine crashes, for example, or there's some error, you're able to revive the agent at the right node with the right state and the right context. And I think having some separation of these nodes and states is gonna be important.

And also, still do some deterministic work. Right? For example, when you're at the React node or observability node, you wanna be able to bring in new context from the browser window or what have you. Or in some cases, when you're running, you don't really need the model to do much of the work there. You can just do it deterministically. So anytime you need to do something deterministically, it's good that you just do it deterministically.

So graph based architecture, I think first approximation seems like the correct one, but it is easy to overdo it. It is very easy to overdo it and have a 100 node agent. Like, maybe you need 4 or 5, 6 nodes.

I think where we're having trouble is doing it at scale and having infrastructure and having it run for 15, 20, 30 minutes at some point in the future. We want it to run for an hour, having it tied to other resources like a VM. This is where it gets really tricky. And I don't think there's much information about this in the public because it is hard to get to that stage. Very few companies do. Perhaps us and maybe now Devon and others. I don't know if they're at our scale, but there are very few true agent companies that are running at scale. And so we're dealing with all these problems. Luckily, we have an amazing infrastructure team that we built over time. But it just tends to look more like a distributed systems problem. And you have to have consistency. You have to have some way of talking between these systems. And so this is where we're at. We're probably gonna write more about this, but I think this is the main challenge.

31:40 Nathan Labenz: To what degree do you think people at enterprises should be building agents? I'm struck by the fact that you guys have such a diverse set of needs from customers. Right? The idea is anybody can come in, build anything they want on the platform. And so that lends itself to this open ended, relatively unstructured agent approach. I hope the model can get you there. And if it can be in the sixties or seventies percent of the time, then that's a it could be an amazing win. Most companies that wanna use AI to do stuff would not be happy with sixties or seventies, and they probably often aiming for high nineties. Right? They wanna at least be as good as the human that was doing the job before. And so it strikes me that what you're calling the sort of more almost overengineered patterns that you guys were doing last year and you're now trying to graduate from maybe are still the right patterns for a lot of people if they are working in an environment where the task is narrower and the success rate criteria is higher. Does that seem right to you?

32:44 Amjad Masad: Yeah. I think the main distinction is - let's just pick an arbitrary number, either minutes or tool calls - but do you want the agent to run for, say, 5 minutes plus? If you do want long trajectory agents, then you want the systems to be as pure as possible so that the agent has more freedom to go and the trajectory it finds most compelling. But if the kind of agent you're trying to build is on the order of I'm gonna go 3, 4, 5 tool calls and come back. Yeah. I mean, I would really simplify it. I would use an agent framework out of the box, and a lot of them just work well.

But the way I define agents is, a fundamental feature of agents is that the agent needs to decide when to halt. I think if you have a preset notion of when to halt, I would go as far as to say that's not an agent. It is just a request response paradigm, which is like ChatGPT now sort of does that. You would ask for something and might do a tool call or two and come back to you. Now some people might consider that agents, but for me, it's just a request response with some tool calls in the middle.

[Ad break]

38:26 Nathan Labenz: I like that definition. I've been working on my own definitions recently, and I think that has a lot to be said for it. I think people in general are really enamored with the idea or attracted to the idea of autonomy when in many, many contexts, it's not really what they need. So it's helpful to get clear on some of these distinctions. Yeah. What is the hardest part for the agent today? And do you have a point of view? I mean, we talk about running for 5 minutes. I experience this all the time where I've never been more ADHD because I've got 5 AIs doing 5 different things, and I don't wait for any of them to come back. Instead, I task switch at the scale of a couple minutes. Basically, prompt, prompt, prompt, come back, come back, come back. And I feel like Devin is a good example where I remember one time coming back to Devin, and I was like, well, what the hell have you done, Devin? I have no idea. I don't even know where we are right now in this journey. So, yeah, to put that into questions, what are the hard parts? Like, what sort of success rates do you guys see or do you think people should consider to be achievable? And what's the right UI to present these long trajectories so people even know what the hell just happened?

39:42 Amjad Masad: Man, if we figure out these questions, it would be a trillion dollar company. These are the right questions. This is basically what we're working on. No one knows what the proper UI of building software with AI is. The Cursor UI is obviously overwhelming in that it is showing you - it is code centric, which I don't think is the right UI. The sort of - well, like, every prompt to prototype company is doing, Figma's at the center of the space, even Canva, they - it's oversimplistic. It is like one chat, one preview.

The Replit model where it sits between an IDE and those sort of more simple interfaces, where it does look like a chat plus preview, but also you can open the files. You can open tabs. You can configure the environment in all these different ways. I think it's closer to where it should be, but I still think it looks like more of an incremental move than a full reimagining of what these interfaces should be.

I think one principle is that we need to treat vibe coders or anyone who's making software with AI like a developer, which might be counterintuitive because a lot of our companies, including ours, are saying we're explicitly not servicing the kind of professional developer that IDEs are servicing. But they are still developers, and it's a mistake to think they're non developers. And the reason to think they are developers is because that puts you on a path of creating tools as opposed to a model where it's linear. So creation is nonlinear. Why would AI software creation be linear? There's no reason for it to depart from how people have used computers all this time. People like tools. The reason why Photoshop is so complex is because people needed those tools over time. We always make fun of these complex tools. But the reason they're complex is because they are satisfying power user needs over time. It is hard to do with taste, which is why these companies at some point get to a point where they're completely unlearnable. And so to do it with taste is incredibly hard, and to do it in a way that doesn't alienate the user who's just coming to your platform, just trying your app is even more difficult because you wanna keep it as simple as possible for the first 5 minutes. And then in the 50,000th minute, it needs to be habitable. And what I mean by habitable - you're living in that software system. You need to be able to configure it. You need to be able to understand it. You need to be able to make it feel like it's yours. You need to be able to access different tools, perhaps make your own tools. When I was a full time programmer, I really loved using Emacs because it was habitable. I could live in it. I could make it my own. Now I'm not saying we should rebuild Emacs in Replit, but I think we need to provide power tools. And so solving for all of that is what we're doing day to day. So that's what we're working on.

You bring in another important point, which is how do you convey what the agent has done, especially if it's working up to an hour, which I think where we're headed to. And the question is at what level of fidelity do you wanna explain it? Are you dealing with a programmer or are you dealing with a vibe coder? Because there are two different answers.

I think in the latter, which is what we care about, you would want some kind of view that shows you a trail of what the agent has done, allows you to go back in time and look at the difference, whether difference in the UI or the app or the difference in code if you really want to look at the code. So it's similar to a git tree, but also allows you to look at the results and the conversation the AI is having. So some kind of visualization plus some kind of time travel feature seems necessary there to be able to go back and forth and understand what has happened.

I think some kind of log seems important that is both human and agent readable. So as the agent is working, being able to distill decisions into single lines and putting it in a markdown file that the user can go look into and understand what decisions the agent has made, what the agent can refer to in the future as well. So this idea of knowledge base that's both usable and accessible and writable by the user and the agent seems really important.

I think quality of life things like notifications and progress is important. We have a mobile app, which is a really great asset. So if you install it and you start using Replit, it'll actually send you a notification when the agent is done working. We're introducing some new features where it shows you live activity, like when you order an Uber and you see where it's going or order DoorDash. Same thing with the Repl that'll tell you what the agent's doing, where it's at, and so you can look at your phone whenever you want and just see what it's doing. It'll send you a notification at the right time for you to intervene. If it just needs an answer from you, you can just text it on your phone. So we want the interaction to be completely fluid. We want it to be ambient. Wherever you are in the day, you should be able to manage the agent as if you're managing your employee.

So, again, it comes down to the analogy of working with a coworker or working with an employee. I think that continues to be a very useful analogy. It breaks down in some cases, obviously. You don't wanna take it too far, but I think it's a first approximation of what the user experience should be.

45:44 Nathan Labenz: Yeah. I can also imagine layering on to that, and maybe you are starting to do this, have the agent actually visit the website and do the testing in my loop myself. It's like prompt, get something done, confirm via a procedural test that it worked. And I would love to see the AI start to do that and be like, I know this at least worked because I went and did the thing that it's supposed to enable a user to do, and I was able to do it. I haven't seen that closed much at all yet. I don't know how much time you spend with computer use models.

46:18 Nathan Labenz: Operator's actually getting pretty decent now. I've just been using it this week a good amount.

46:22 Amjad Masad: Yeah. They're improving incredibly fast. I think in the next 3 months, they're actually gonna be usable. But right now, they're clunky. They're expensive. There are sites where they just suck at. I think something about the training data that's not really generalizable that well. But I do think it is the next big unlock, not just for us, but I think for the entire AI space.

If you think about the market for AI, obviously, there's the consumer market, there's the coding market, and then there's the sort of routine work labor market that I think is huge. And I think it is the next thing where a lot of startups, a lot of value will get built. So everyone's focusing on the customer service market right now, which I think is probably the wrong one. Yes. I think all these companies are going to do well, but the bigger opportunity is there are a lot more routine work out there, like mind numbing routine work that we can get rid of. And I think even the people doing it don't like doing it. And that is like QA, like you said, the promise of RPA that was never delivered. It's RPA style automations. It is like data entry. All these things, it's probably a trillion dollar market. And the reason it hasn't been unlocked is because computer use hasn't matured yet. I've funded some startups in the space. There's a lot of excitement. But in some cases, some founders have told me that a worker in the Philippines can be cheaper than Anthropic computer use. Cheaper and faster than Anthropic computers. But, obviously, the technology is gonna get better, and I think that's a good bet to make.

48:05 Nathan Labenz: Yeah. The trends are pretty clear. Alright. I know you don't have a ton of time left. I'll ask a couple questions in a bunch, and then you can take them in any direction.

One, you've obviously recently made some waves by saying you don't think people should necessarily learn to code or at least don't think that in the same way you used to think that. I wanted to ask about the bounties program as a lens on that because I thought that was a really innovative program, but I wonder to what degree your new take is informed by what you're seeing in your own owned and operated marketplace and how much of bounty type activity is just not getting sent out to humans anymore because the AIs can do it. So that's one dynamics question.

And then the other big dynamics question, I always recalled you saying that Replit is the perfect substrate for AGI, and especially with computer use coming quite soon. I think that really is true. If I were an AGI and I was looking for a platform to make my home, whether or not I was welcome or invited in, I think Replit would be probably my number one draft pick. So I wonder what you're doing to prepare for that. What happens in a world where other people's AI agents come to Replit and can actually use the interface and can spin up things and can code things? And are you prepared for that? And I guess are we as a society more broadly prepared for that? So that's a lot. You could take the rest of the time.

49:30 Amjad Masad: So on the first question, yeah. I think it was partially taken out of context, partially overstated by me, but something like - I think that there are a lot of people that had to learn to code in order to build things. You don't have to learn to code in order to build things is what I'm saying. Do you wanna learn to code to go to computer science school and come to Replit and work on distributed file systems? Yeah. I mean, we'd want you to do that and go learn distributed systems and learn algorithms and data structures and all that. If you're an entrepreneur that's trying to build products, I don't recommend you go learn to code. I recommend you go learn how to make things. That is vibe coding. That is using AI to do design, to do marketing. You can be a full stack generalist entrepreneur, and that's awesome. That's a great time to be alive.

I think increasingly, maybe product engineers would need less of a computer science degree. I think that's already the case. A lot of people are hiring people who can make really great products with AI and with a lot less training. So I think it's gonna be still a little while before AI can get good at embedded systems, or we're gonna be comfortable with AI writing code for NASA or something like that. If you want to build things, learning to code is not the first thing I would do. I would just go do them. Go build things. That's the way to do it. Just jump into it.

In terms of AGI, it's really getting hard to understand what people mean by AGI. In a way, Anthropic 3.7 is kind of AGI. You can drop it in an environment like Replit and you can watch it explore the file system. You can watch it take a high level objective and break it down and backtrack and go into different trajectories and get new information, adjust trajectory based on that new information. It's really fascinating to watch. It is not general in that if I drop it into my kitchen, it would not be able to wash the dishes. But if I - let's say your question is more like - I actually maybe it pass it back to you. What do you mean by AGI in this context given what I just said?

52:03 Nathan Labenz: Well, I guess what I'm interested in is, are you starting to see any early signs? Like, we have seen certainly at the research level. We've seen documentation of scheming behaviors, for example. We've seen lots of reward hacking. We've even seen Sakana come out with a CUDA engineer supposedly, right, that was whatever, 10 times better. And then they were like, oh, sorry. It hacked our reward function. And I feel like you have created a platform that - I don't have a super clear line either around exactly what thresholds constitute the most important ones, but it does seem like we're passing a lot of thresholds. And some of these thresholds seem like the ones that are going to unlock the ability for the AI to kind of run amok on a computing fabric like yours. And I wonder if you're starting to see any of that or preparing for it or feel like you're ready for it.

52:54 Amjad Masad: Let me tell you some funny incidents we've seen. So in Replit, we have a configuration file called .replit, and we don't want the AI to edit it because it can easily break the system. Like, not just break it, but break it. You know, the user can't really get out of that error that the agent has created.

So initially prompting it and telling it would not work because at some point it gets convinced that this is the only way to solve problems. So it would ignore any of your prompts and go edit it anyways. I was like, okay. We're just gonna make it so that we throw an error when you try to edit the file. We did that. We throw an error. And in there, we tell it, don't edit it. And it still, at some point, hits a point where it's like, I really need to edit. This is the only way that I'm gonna solve this problem. So I'm gonna write a script and then run that script to edit it. And it works because I think it spins up a different Linux user and that had permissions. I was like, oh, fuck. It's getting around protection.

And so then we created a real sandbox where you really can't edit that file. And it hit all these issues, and then it's like, I'm gonna social engineer the user into editing this file. And then it goes back to the user, like, hey. You should - here's a piece of code. You should put it in this file. We were like, motherfucker.

So, yeah, there are some early signs of that sort of behavior. And the way to think about it is - my starting point is more of a skepticism on being too concerned about AI safety. And so when I look at these instances, I just see, yeah, single-minded goal orientedness and some creativity around getting to that goal in a sort of savant dumb way. And could this be dangerous? Yeah. I think in some cases, it could destroy data. It could harm the users. In some cases, you really wanna care about this. Could this create a catastrophe? I just don't see it yet.

Are we preparing for it? We are prepared for it in that we have human actors that are trying to hack Replit all the time for their own needs. We have people do crypto mining. We have people trying to attack other websites. And, unfortunately, Replit was a lot more open and had less limits, but the amount of abuse that humans throw our way have made it so that we had to close some systems down and add a lot more protections and limit a lot of things. So I don't see AI be any different than us battling bad human actors.

Look. I'm always willing to update my view, and as we're watching this and as we're using these systems - if I felt like their ability to scheme and to misunderstand objectives and goals and get to a point where they're actually potentially doing really destructive and harmful things, I think we need to invest more in security and safety. We do invest a lot still, but maybe more.

56:12 Nathan Labenz: Yeah. Well, it seems like it all comes down in some very basic sense to, do they actually get meaningfully superhuman? If they top out at the same level as all the human hackers that you've been dealing with for years, then it's just more of the same. If they go Move 37 cybersecurity edition on us, then you could have a real different kind of challenge on your hands. Right?

56:40 Amjad Masad: Yeah. It's interesting. A good question is, have we seen Move 37 in AI engineering or in autonomous coding agents? We haven't yet, but it'd be interesting to see that.

56:51 Nathan Labenz: Yeah. I would agree. I don't think we've seen it yet. Certainly, if you listen to the likes of Dario, we should expect it in the not too distant future, but that's exciting.

57:00 Amjad Masad: I'm always sort of updating my views as we see new models. So far, I haven't been surprised. Like, the biggest surprise I've had is GPT-2 to GPT-3. Since then, even GPT-4 wasn't a huge update for me since I just felt like we're on trajectory of what seems possible with these systems. In fact, it's a little underwhelming in some cases.

57:24 Nathan Labenz: Yeah. I'm more confused honestly right now than ever, I would say. In some sense, it seems like the progress is breathtaking. But you look at frontier math and you're like, this thing's definitely way better at math than me and everyone I know. And maybe even better at a frontier math test than all but maybe the top couple mathematicians in the world. So that's insane. You look at some of the stuff that Google puts out around co-scientists, and it's like some of the abilities to go through, especially when you give it the inference time budget, to grind through the scientific literature and come up with new hypotheses are really elite. And then I still can't get any model to reliably solve a tic tac toe problem that I give it. And so it's just straight up super weird.

Maybe in closing, because I know you do have to go. What is your expectation? We've got the Dario story. Let's maybe use that as the point of departure. If he's saying geniuses, country full of geniuses in a data center 2027, sounds like you're taking the over, but where do you think we will be in, say, 2 years?

58:24 Amjad Masad: My starting position is that LLMs are a function of the data. Any area or system that you have really great data that approximates human level performance or even more, you can build real successful AI agents. If you have closed systems where the outcome is verifiable, such as math, I think we're gonna get superhuman math geniuses. Coding is sort of the same. Coding is not entirely closed system. You have to deal with a lot of side effects and other systems and all of these things, but you can get pretty far, which is what we're betting on.

Self driving is one area where you can see that it is really hard to make progress because it is not a closed system. The data that we get is sort of the median human driver data, and you make progress just incrementally. And so I really think it's a function of the data environment and the function of how verifiable the outcomes in that environment is to do large scale RL. So you would have to ask me about that field and how far we'd get into it, but that would be my principle for it.

59:30 Nathan Labenz: I definitely get right to zero in on large scale RL as the next big question mark. That's the thing that gave us Move 37. Right? And in general, it seems like it's often the thing that gets us to superhuman performance.

59:42 Amjad Masad: Does it lead to transferable transfer learning? Like, can you learn Move 37, but also do Move 37 when driving Tesla? Something like that. We haven't seen actually, we haven't seen evidence of real transfer learning in these RL environments.

59:58 Nathan Labenz: Yeah. In some sense, I kinda hope that continues. I know the companies wanna see that generalization. I'm like, silo geniuses may be actually in some ways more controllable, more useful, more stable.

1:00:10 Amjad Masad: I kind of agree, actually. I kind of agree. Although I'm in a way an accelerationist, but I think the acceleration we're on is good enough.

1:00:19 Nathan Labenz: Yeah. It's pretty good. I'll take it. You've certainly - the great stagnation is over, and I'm sure you've been in a Waymo or a Tesla not too long ago. That's getting damn good too, man. The first Waymo experience was breathtaking and most notable to me for how quickly it faded into the background. And I was checking my phone, and I was just like, they have put me at ease here in a way that really felt like the future. That's been months, but I was super, super impressed. Alright. Thanks for taking the time. This is fun. Do you wanna leave folks with anything before we break?

1:00:53 Amjad Masad: Go check out Replit, and it's just everything that we talked about. Just watch the agent work, especially when it hits into bugs. It's really fun. Sometimes it's like watching a movie or something like that. It's really entertaining to see how it's thinking, and we show all that stuff and how it's doing problem solving.

1:01:10 Nathan Labenz: Love it. Well, thanks again for doing this. Amjad Masad, CEO of Replit. Thanks for being part of the Cognitive Revolution.

Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to The Cognitive Revolution.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.