The AI Assistant Revolution with Flo Crivello of Lindy.AI

The AI Assistant Revolution with Flo Crivello of Lindy.AI

Watch Episode Here


Read Episode Description

People have long been imagining AI assistants. Flo Crivello is turning that dream into a reality with the ambitious project Lindy AI. Flo sits down with Nathan on Lindy's announcement day to talk about this unique moment in AI, Lindy's capabilities, single-use apps, and how he sees the global impact that AI-powered employees can have on the future of work.

Anyone who mentions "The Cognitive Revolution" when signing up for the Lindy beta will get priority access.

(0:00) Preview
(1:05) Sponsor: Omneky
(4:50) Flo demos Lindy
(9:21) Security and user experience
(11:55) How this AI moment compares to the Industrial Revolution and single use apps
(15:00) The dream AI assistant
(19:54) Competition and advantages of startups
(24:28) Cost vs Quality
(31:19) How Lindy uses user data
(32:37) Context and data injection
(28:11) Comfort with ambiguity
(43:10) Use of guardrails and trust concerns
(47:26) AI safety and edge cases
(49:16) Impact on labor markets
(50:27) Economic resilience
(52:09) Future of work
(55:49) Rapid fire questions

*Thank you Omneky for sponsoring The Cognitive Revolution. Omneky is an omnichannel creative generation platform that lets you launch hundreds of thousands of ad iterations that actually work, customized across all platforms, with a click of a button. Omneky combines generative AI and real-time advertising data. Mention "Cog Rev" for 10% off.

Join 1000's of subscribers of our Substack: https://cognitiverevolution.substack

Websites:
lindy.ai *mention "The Cognitive Revolution" when you sign up for the waitlist to get priority access*
thecognitiverevolution.ai

Twitter:
@CogRev_Podcast
@labenz (Nathan)
@Altimor (Flo)


Full Transcript

Transcript

Transcript

Flo Crivello: (0:00) Napoleon had the highest status material of all: aluminum. I'm drinking LaCroix right now, and aluminum has become so cheap we literally throw it away. I'm drinking from aluminum, and then I'm going to throw this away. This would have been unthinkable. I think it's going to be the same thing with code. An app today costs on the order of $10,000 to $100,000 to build. We're rapidly approaching a world where an app costs on the order of one to ten cents to build. Once that happens, you basically start having disposable apps. You start building an app just for this one session that you have right now. That's basically what Lindy is doing.

Nathan Labenz: (0:38) Hello and welcome to the Cognitive Revolution, where we interview visionary researchers, entrepreneurs, and builders working on the frontier of artificial intelligence. Each week we'll explore their revolutionary ideas, and together we'll build a picture of how AI technology will transform work, life, and society in the coming years. I'm Nathan Labenz, joined by my co-host Erik Torenberg. Omneky uses generative AI to enable you to launch hundreds of thousands of ad iterations that actually work, customized across all platforms with a click of a button. I believe in Omneky so much that I invested in it, and I recommend you use it too. Use CogRev to get a 10% discount. Our guest today is Flo Crivello, founder and CEO of Lindy, an AI assistant that aims to put your life on autopilot, online at lindy.ai. One of the smart but already increasingly mainstream takes in AI right now is that the next big thing will be actions. That is, taking the AI paradigm beyond generating text and allowing models to use tools to take actions and ultimately get things done. There are a number of ways to go about doing this, but they all involve using a language model to bridge the gap between a user's request, typically specified in natural language, and an action space, which could combine APIs, web browsers, other software applications or interfaces, and eventually even robotic controls, all while taking into account the user's history and preferences, which unfortunately for developers tend to be spread out across email accounts, Slack conversations, calendar history, and much more. People have imagined AI assistants seemingly forever, at least from The Jetsons to Her. But recently they've become a central focus. ChatGPT broke through to the mainstream in part because of its accessible chat assistant style interface. And in the context of AI safety, assistants have been described as a laboratory for alignment. To state the obvious, economically an AI assistant that really works will be incredibly valuable technology. Indeed, Flo talks about Lindy as a virtual employee. And with such a big prize to be won, he'll surely face competition from some of the biggest companies in the world. Alexa and Google Assistant are already in millions of homes, and Siri is rumored to be due for an update in June. Still, Lindy is no hackathon project. It's a serious effort with a dedicated team backed by real resources and critical connections. Flo has raised $50 million, and his team was building on GPT-4 for a month prior to launch. They've got a level of ambition that sometimes creates life-changing products and generational companies. So while Lindy is being announced today and currently still in very limited beta, in keeping with our goal of helping listeners understand and where possible experience the near-term AI future, we did get Flo to agree to prioritize anyone who mentions the Cognitive Revolution when signing up for their waitlist. One note: Flo does give me a screen share to demo the product during this interview. And while that part of the conversation doesn't go on for very long, and I think you should be able to follow it from the audio alone, we do have video available on YouTube. And you can also watch some flashy demo animations on their site at lindy.ai as well. Now I hope you enjoy this preview of the AI assistant future with Flo Crivello. Flo Crivello, welcome to the Cognitive Revolution.

Flo Crivello: (4:25) Thanks for having me.

Nathan Labenz: (4:27) Really excited for this conversation. You are launching something new, and I think it is something that is going to, you and others who are exploring this space, I think are going to really change the way that people use computers by harnessing the power of AI and making it useful for practical everyday tasks. So tell us about what you're building, what you're launching, and let's get into it.

Flo Crivello: (4:53) Yeah, so we're building lindy.ai. It's an AI assistant that can basically do stuff for you. You can think of it as ChatGPT that can also use your applications and access your data in order to automate your work. It can own your calendar, own your email, book travel, send contracts, scrape the web, help you prospect, help you recruit, all that kind of stuff. We've basically trained a large language model to be able to use tools and to train itself on how to use tools based only on the documentation of these tools. We literally just feed it the human documentation of an API, like the Stripe API or the Slack API, and then it knows how to use the API. We can give it a goal and it figures out how to use which APIs when in order to fulfill its goal. For example, if I say, help me find half an hour with Nathan tomorrow, it's going to first hit the Google Calendar API to find my calendar and my availabilities, and then it's going to compose an email and hit the Gmail API to send you these availabilities by email. And then once you reply, it's going to hit the Google Calendar API again in order to put the time on both our calendars. So that's the idea. We're actually going to

Nathan Labenz: (6:09) Do the screen share, and this will be available in the video version of the podcast. So if you are listening, you can still listen along and hopefully it will be clear enough. But if you really want to see the thing in action, then flip over to the video version and you'll actually be able to see a preview of this product, lindy.ai.

Flo Crivello: (6:27) Here I have a big text field in the center. It's funny, actually the first good name for the product was Googlebot. It's like Google except it does stuff for you. And I'm going to type, find me ten software engineers in San Francisco. I'm actually using Lindy for hiring quite a bit. It's really making it 50x easier for me. So it loads a little bit, and then it asks me, hey, you're going to use some prospecting credits. And then it shows me a list of software engineers. If I click, there's name, current role, LinkedIn. And if I click, I see actual software engineers. So this guy works at Stripe. This person works at Asana and so forth. There's a little bit of debug information here. None of this UI is hard-coded. Basically the AI has UI components that it just mixes and matches in order to do its job at any given time. So now I can follow up with these candidates and I can say, send them a recruiting email to join lindy.ai, a startup building an AI assistant. Or I could say, send them a recruiting email based on this job description. And here I could paste a Google Docs link or a Notion link, and it would read the job description and draft the recruiting email based on that. Or I could say, personalize the recruiting email for these people. And boom, it sends the email, and I can just customize it here. Again, this UI is basically built by the AI at runtime, and I can send the email. I'm just going to cancel here because I don't actually want to send this email. You can also ask Lindy to help you with recurring tasks. So for example, here, and this one is a prototype, I can say, before my meetings, send me the Zoom and attendees' LinkedIns and summary of the last interactions. And it's going to create this automation, which is that five minutes before every meeting it's going to grab the Zoom link, grab my emails with these people, grab my meeting recordings with these people, and grab the LinkedIn of these people, and then send me a summary of all of that on Slack. Back to the real demo. Again, it can grab the meeting recordings because it's currently already joining my meetings. So it joins my meetings. I can see my list of meeting recordings here, and I can click on any meeting. I see that it takes notes for me. It summarizes the meetings, and then I can chat with it about my meetings. What was the meeting about? What did Flo think? What was the takeaway of the meeting? Or I can say, hey, can you update my Salesforce based on this meeting? Or can you create tasks on Linear based on this meeting? And so on and so forth. So it's basically like an AI employee. And just like an employee, you can talk to it however you'd like. You can send it emails, you can talk to it on Slack, or you can invite it to your meetings and collaborate with it however you'd like.

Nathan Labenz: (9:21) Boy, there's a lot here that I think is fascinating and I'm excited to unpack. For one thing, I see that you had a monologue that went longer than the recommended max of 2:30. That sounds like the sort of AI coaching that I also need as somebody who has more than once gone beyond the maximum monologue recommended length. So I think that's funny right off the bat. We're seeing these kinds of coaching interactions coming back to us from our AI assistants. In some ways they already know better than we do. But let's unpack a lot of what we just saw because you showed a lot very quickly. First thing that I want to understand is just how does this thing handle my accounts? Because I've experimented with some similar paradigms, even going back to the GPT-4 red teaming days. One of the first things I thought was, holy moly, this thing is so smart. It might be able to use itself as a tool. It's so good at coding, maybe it can even break down problems and delegate the solving of those problems to itself recursively, and who knows what it might be able to accomplish then. What I found was one of the biggest challenges practically in getting that to work was just everything is protected by authentication. It's hard to, especially if you're just one person like I was at the time exploring something new, it's hard to get enough stuff connected and to figure out all that security logistics to actually get something to work. So obviously in your demo there you had stuff set up, but what is the new user experience and how are you thinking about that, both from a user experience standpoint and ultimately, obviously security is going to be critical to this application as well?

Flo Crivello: (11:14) So it's a little bit like onboarding an executive assistant. When you start working with an executive assistant, you hand over to them the credentials of whatever tools they're going to need to get their job done. So here it's actually more secure than that because you don't give the AI your password. You just OAuth with Google or Linear or Asana and so on and so forth. And we make you OAuth with them on a just-in-time basis. So the first time you ask it, hey, help me find time with Nathan, it might say, look, I need access to your calendar and your email to do that. And then you connect with Google and that takes two clicks. Yes, it's funny what you say about recursively calling itself because that is something that it does. So the way it works is that it basically writes code, and I'd love to get back to that because I think it's insanely interesting, the fact that it writes a one-time piece of code for this. But when you ask it to do something, it writes a piece of code to fulfill the task that you asked it. And in that code it's got functions to call itself back. So for example, in the example that I just gave of summarizing your last interactions with someone, it's going to write a piece of code that's like, hit the Zoom API to grab the recordings, hit the Gmail API to grab your emails with this person, and then call myself back to summarize these pieces. Again, to go back to the code thing, I always compare the moment we're in to the Industrial Revolution. I think the name of the podcast, the Cognitive Revolution, is very apt. The parallels between this and the Industrial Revolution are striking. The Industrial Revolution made some goods so cheap that we could now have single-use goods. We have Bic pens, disposable Bic pens, and Solo red cups that you use only once and throw away. That was unthinkable before the year 1700. A cup was probably the equivalent in today's dollars of, I don't know, $500 or something. And now you have cups that are like five cents. There is this crazy story of Napoleon. At some point he had a banquet with a bunch of world leaders, and the lower-ranking leaders had silver cutlery, and the higher-ranking leaders had gold cutlery, the highest-ranking leaders had platinum cutlery. He, Napoleon, had the highest status material of all: aluminum. And I'm drinking LaCroix right now, and aluminum has become so cheap we literally throw it away. I'm drinking from aluminum, and then I'm going to throw this away. This would have been unthinkable. I think it's going to be the same thing with code. An app today costs on the order of $10,000 to $100,000 to build. We're rapidly approaching a world where an app costs on the order of one to ten cents to build. Once that happens, you basically start having disposable apps. You start building an app just for this one session that you have right now. That's basically what Lindy is doing. You give it a use case. It's going to build a one-time app. It's going to write the code for that app for that one session. And then when you're done with it, it's just going to throw the app away.

Nathan Labenz: (14:09) That is fascinating. I've done that a couple of times already. Just to date the recording of this podcast, we are at GPT-4 plus 7, as I've started marking everything from the date of GPT-4 release. And it has been amazing to just, a couple of times when I had a particular need, whether for a little data analysis or make me a little chart out of this data, GPT-4 can do that straight out of the box for some basic things. It obviously does not have all of the authentication and sophistication, let alone the recursive capability or access to a runtime. So there's a lot here that you are adding on to the core functionality. But even just with the base, I see some of that potential. I'd love to hear use cases that you see for these 1-off single-use applications, maybe how that compares to the example you showed where it was setting up an automation, because that's not a single use, right? That's an ongoing thing. And then I also really want to get into the consumer versus business paradigm here a little bit. I'm thinking about this: is this something you think individuals will use on an ad hoc basis and do whatever they need to do? Or do you see this getting into business and becoming part of process as well? There's a lot there, so take your time.

Flo Crivello: (15:40) The first ones that Lindy does very well are anything that you would give to an executive assistant. So the few very big ones are going to be emailing, calendaring, contract sending, prospecting, and recruiting. For emailing, when you wake up in the morning with Lindy, you open your inbox and Lindy triages your emails for you. She tells you, "Hey, these emails are really important for you to look at right now." And she pre-drafts replies to the emails based on not only your voice, but your voice for that particular recipient. So she learns how you speak to each person. You probably don't speak the same way to your wife as you do to investors, hopefully. And so she's going to draft the email. So you open your inbox in the morning, the emails are pre-drafted, and you can review them, you can edit them if needed, and then you can send them away. So that's one example. Calendar: you can manage your entire calendars. You can be like, "Find me time with Nathan." She can handle conflicts automatically. She handles arbitrary preferences, so I can say, "Hey, I don't want any meetings on Friday. I don't want any meetings before 11AM. That's my focus time." And she's going to respect those preferences. So those are some of the very big use cases that it handles quite well right now. It's basically replacing an executive assistant for a lot of people. Regarding the consumer and professional thing, I think we're basically seeing a new type of computing. I think of this as the next operating system. And it's funny, it's actually an inverse operating system, because normally an operating system lives on top of your hardware, underneath your applications and data. This lives on top of your applications and data. So you just start using this instead of using a lot of your applications and data, and it just patches together all of your applications and data to do the work that you want to do. So at the end of the day, I think this is just how people, regardless of whether they're consumers or professionals, are going to use their computers moving forward. I think that the computing experience of the future is not you doing work on a computer, it is you having a conversation with your computer. And then the computer works with you to do the work or even does the work for you.

Nathan Labenz: (18:00) Yeah, that certainly is the dream, and it seems like it's becoming a possibility rather than just a dream extremely quickly. Could you tell us a little bit about how you decided that this was the moment to pursue this dream? I mean, in some sense, this goes back to The Jetsons, right? Or postwar science fiction or even to some degree, wartime prewar science fiction. And it's always been kind of "one day maybe we could achieve this." It seems like now, again, everybody's got the sense that this is coming into focus. What gave you the confidence to set out to build a product? And how are you thinking about this moment that we're in where presumably you're going to see a pretty healthy amount of competition from other startups, but even more so probably there are rumors of a new Siri, Google Assistant is going to get a lot smarter, we have to imagine. So how are you seeing your place in the landscape and the opportunity to carve out a niche for yourselves with this business?

Flo Crivello: (19:10) So how I decided now was the time? I mean, I feel like, and perhaps the audience knows this as well, it's very obvious, right? When you're close to it, it's so obvious that there is something unprecedented going on right now in AI. I have been following AI for more than 10 years. The ImageNet moment sucked me in, like for a lot of people. CNNs and RNNs started to work. At the time, I was a software engineer and I followed some AI courses on Coursera. And it was super exciting, but I couldn't come up with use cases back then. And I kept following along and I actually got really excited and I almost started a startup in this field when GPT-2 came about. And I wanted to start something in enterprise sales with GPT-2. I played around a little bit with it. I did some experiments and still it didn't feel quite ready just yet. And then GPT-3 came about, ChatGPT came about very recently. And when you start playing with these products, you get very good results very quickly. And then you're like, okay, now is the time when actually this is starting to become very valuable. Regarding the competition, I'll say a few things. One, I don't think too much about the competition. I have so many minutes in the day, so I try to spend them thinking about my customers and not too much about my competitors. I think it's Jeff Bezos who said, your competitors aren't giving you any money anyway. So why think about them? I also think it's going to be such a huge market that it's just going to be the mother of all markets. I think there's going to be many winners in this space. I also think the Google Assistants and Siris generally have one handicap that these incumbents have, which is speed. I think it's Reid Hoffman who said, "We are driving at night on uncertain terrain into the fog, and no one can see very far along. And no one is strong in that kind of environment, but I think that huge companies are even less strong." I think startups thrive in this kind of chaotic environment. So I think startups are going to have an advantage here structurally to play in this kind of environment.

Nathan Labenz: (21:19) How do you think that plays out in practice? Obviously there's the speed factor and just the willingness to ship without a million sign-offs and all that. Everybody's familiar, I think, with those dynamics. Do you see specific features or use cases or paradigms that you think you or startups in general are better able to embrace as compared to say, a Siri or a Google Assistant?

Flo Crivello: (21:49) Yes, I think, so to go back to your question earlier about consumer or professional, although I see this eventually as a universal new computing paradigm, right now we are focused on the executive assistant use case for busy professionals. You have too much to do. You have too little time, too many meetings on your calendar. We help you put order into this chaos. That is something that Google Assistant or Siri or Alexa can't do. They have consumer products. And these companies are so huge that they have to cater to a universal audience from day one. Right? It's the same reason why Apple has FaceTime and they've actually been fleshing it out. Now you can send FaceTime links, calendar invites and all of that stuff. But Zoom is still a formidable company because they are focused on the B2B professional use case that Apple structurally cannot go after. Yeah, I think that is one example of an instance where startups will have an advantage. It's not one specific use case, it's more the ability to focus on any use case versus having to cast such a wide net. I think that's one. Yes, to your point, the bureaucracy and the legal rounds of review inside companies are another thing. I also think generally, yes, startups can be more nimble, they can change direction, they can make more experiments, and they can take more risks. You can just allow yourself to release a product that is slightly more rough around the edges, slightly less ready, and you don't risk what happened to Google where they announced Bard and there were issues and they lost billions of dollars of market cap overnight. That's not a risk for you as a startup.

Nathan Labenz: (23:29) So let's talk a little bit about how it all works. I mean, you've given us a little bit of insight into that, but I'd love to understand. One of the big themes that we've heard in talking to different entrepreneurs and builders is this constant advancement of the models. And there's then also the question of how do we think about using the best available model, which right now is pretty clearly GPT-4. I need to spend a little more time with Claude v1.2 as well, but I think it's safe to say GPT-4 is ahead for the moment. So it can do more and more stuff. Then you may also think, geez, that could be expensive, or there may be some things that it can't do that we need to train our own models to do, so we may have a mix of models. So I'd love to hear your thoughts on the mix of models first, and then I also want to talk about memory and client profiles, but we'll come back to that. So tell me first, how are you thinking about what models to use? Is it a mix? Are you training your own?

Flo Crivello: (24:28) The criteria we use here to make the decisions is the quality of the responses. That's our single only criteria. And I really insist on a single only criteria. We don't care about cost, for example. So right now, that's the guidance I've given to the team. People say, "GPT-4 is so expensive," and I say, "I don't want to have this conversation. We're not talking about cost in this room." Right now, most lies on our side. I think the cost of these models has been divided by 20 over the last year. This is not a problem. So right now, Lindy is costing us a pretty penny per customer. It's dozens of dollars per month per customer. And that's just okay with me. I don't really care about that just yet. With that said, right now, GPT-4 is head and shoulders above everyone else, and we've tried pretty much everything out there. But it is often surprising to people to what extent you can actually get better results than GPT, at least GPT-3.5, on at least one narrow use case if you build your own model based on that one use case. To your point, I think the landscape is changing super rapidly right now. And so what we've had to build is we've had to build the infrastructure to be model agnostic and be able to swap out the model super quickly and retrain new models very quickly once a new open source option out there starts working. Right now, we mostly run on GPT-4, but we also are building and fine-tuning our own models, and they are getting better very rapidly on our benchmarks. And so eventually, I think we're going to have a mix of GPT-4 and our own model, and again, purely for quality reasons. So we have a ton of data that we have collected through a lot of means. We're using that data to fine-tune our own model. And that actually makes it almost as good, and I think soon better than GPT-4, again, for our one specific use case.

Nathan Labenz: (26:23) I think what you said there about not caring about cost right now definitely seems to be a trend and is definitely one that I subscribe to as well. It is crazy to think that just 6 months ago, I believe it was August, was when the cost of the mainline models dropped from 6 cents per thousand to 2 cents per thousand tokens. Then they dropped that further down to 0.2 cents per thousand tokens and then reintroduced a higher price point with GPT-4. That difference is, right now, it's basically a 20x, the structure being a little bit different where GPT-4 has a different price for input tokens versus output. But it is interesting to hear that basically, yeah, just pay up for the 20x difference. All we care about is quality and we have enough faith in price drops continuing and our own ability to engineer stuff that it will resolve itself over time. I do think that makes a lot of sense, but it does take some conviction about a point of view of where the world is going.

Flo Crivello: (27:35) Conviction and money. Yeah. But it's not a tough decision to make, really. When you look at that curve, it's so smoothly going down. And it's never bent against Moore's Law, and here it's even more than Moore's Law. It's an order of magnitude even faster than Moore's Law. So the question really, and that's a little bit of a mouthful, but does cheap get better faster than good gets cheaper? Our bet, and so far history has always proven this out, is that good gets cheap very fast, and cheap doesn't necessarily become good.

Nathan Labenz: (28:11) Yeah, that's interesting. So how much easier has GPT-4 made your life in building this product relative to 3.5?

Flo Crivello: (28:22) A lot easier. It's pretty shocking, the improvement. I think the bigger context window alone is a huge deal. You used to have to hack your way around the context window and summarize and embed and fine-tune and do a bunch of crap to not have to deal with the context window. The context window is still a constraint, but much less of one. It's really not something you have to worry about nearly as much. So that alone is a game changer. You also need to do a lot less prompt engineering for GPT-4. GPT-3 and GPT-3.5, you had to coerce them into giving you the outputs that you expected. And so lots of prompt engineering is like, "You are a smart assistant. You don't want to do X, Y, Z." GPT-4 works out of the box a lot more often. It works zero-shot. You just ask it to do something with zero prompt engineering around it and it just works a lot more often. So it's made a pretty dramatic difference. I would say the only downside so far of GPT-4 has been the speed of it, or lack thereof. It's pretty slow. The generation is quite slow. And so you may have noticed during the demo just now, it takes a few seconds for it to answer. Well, it's not the end of the world because the way we think about it is even if the model takes 30 seconds to answer, which it doesn't, even if it took 30 seconds, that would still be way faster than a human executive assistant. So I think that in a lot of ways, the product is superhuman as an executive assistant out of the box. It's available 24/7 and it answers in 10 seconds or less.

Nathan Labenz: (30:03) Yeah, so one of the things I think in many respects that is going to undeniably be true. For one thing, there just aren't that many executive assistants out there that know how to code. Although they can increasingly figure that out with GPT-4 as well, perhaps. But one thing that a human teammate is going to do that I think you're also working toward, but it's not quite clear to me exactly how you'll accomplish it yet, is really getting to know you over time and knowing your tendencies, preferences. You mentioned relationships and talking to different people in your life in a somewhat different way, a different voice. What I see out there in the world right now is the default paradigm would be: we embed all your stuff, and then at runtime, when you ask for something, we translate that to a database query against the embeddings, and then we pull out stuff and rank, and then we take whatever pops up to the top of that ranking and stuff that into the context window for the language model to use that information to inform what it's going to do. And I think that makes a lot of sense, certainly for things where I have just tons of content sitting around, or for companies who have internal knowledge management systems already that they might want to layer a chat onto. That makes a ton of sense. Are you doing that? Or it sounds like you may be doing something a little bit different because you said the model writes code to search. It sounded like you were talking about just searching my Gmail directly, for example. So how do you think about ingesting all this content, embedding it, trying to have a semantic search versus using the many search APIs that exist, or maybe it's a combination of both?

Flo Crivello: (31:57) So we do a mix of three things. There's basically two kinds of things that the model needs. It needs your preferences and it needs your data. And we do three things to get to those. So for the preferences, it's funny actually. This is an example of a time when GPT-4's context window really helped because at first we were like, "Oh, let's do what you just described. Let's encode your preferences in a vector database and pull them when you ask for something." It's like, "Hey, help me find time with this person." And then we vector search against your preferences that said, "Don't find time on Fridays." And so I'm going to inject that into the context window. And then we took a step back and we were like, "Wait a minute, guys. GPT-4 has a 32,000 context window. How many preferences do people have? You're not going to have 30 pages of preferences to give to your assistant." So it's like, let's just dump all of that up front in the context window and we'll deal with that later once you really have too many. And that works really well. So that covers the preferences. Then the two other things we do are for your data. So for example, if I go, "Write an email to Bob about this report that I just wrote in Google Docs," or whatnot, there's two ways that Lindy is going to pull this data. One is using what we call a context injector that lives upstream of the code generation model. So the context injector takes a prompt from the user and tries to figure out what questions it needs to answer, what context does it need to fulfill this prompt, and where it can find this context. And then it pulls this context from these sources and injects them into the prompt. So it hydrates the prompt. So for example, if I go, "Send an email to John." Suppose I literally just said that: "Send an email to John." The context injector is going to go, "Who's John?" So that's the first question. Who's John? How

Nathan Labenz: (33:49) am

Flo Crivello: (33:49) I going to find that out? It's like, okay, I have all of these sources of information. I have his contacts, I have his calendar, I have his email, and I'm going to use a mix of all of these to find out who John most likely is. And so if I always work with the same John day in, day out and email him, it's going to be like, "Okay, this is John." So that's the context injector. And then even without the context injector, the last thing we do is that the code generation model is also able to write code to pull this context at runtime. So if I say again, "Write an email to John following up on this Google Doc," the Google Doc thing is probably going to be handled at the code generation phase, and the code is going to be like, "Hey, I'm going to hit the Google Docs API, pull this Google Doc, summarize it, and then send an email to John about it." So that's the last part of how we inject this context.

Nathan Labenz: (34:36) And then do I imagine correctly also that you're building your own profile of each user that lives in its own place, in your database that didn't exist otherwise?

Flo Crivello: (34:51) Yes. So we connect to all the sources of data that you feel comfortable connecting us to. So your email, your meetings, your documents, all of that stuff. And then, yeah, it basically knows you better than anyone, and it can use all of the data to personalize everything. I wanted to highlight that privacy obviously is number one. So we've actually laid out seven constitutional principles, and privacy is one of them. It's actually funny. The etymology of the word "secretary" is "secret." So this is a person that can hold your secrets. So we take that super seriously. Lindy never leaks information, and we as a company are not in the business of selling this information. We will never do ads, for example. What we do is we charge you money for access to Lindy. And so that aligns the incentives quite nicely and we take the privacy super seriously.

Nathan Labenz: (35:41) Yeah, that'll definitely be an important selling feature to continue to emphasize, I'm sure. The constitutional principles I also think are really fascinating. I mean, on some level, it's like a company core values type thing, and that is fairly familiar, but I couldn't help but also call to mind Anthropic's recent constitutional AI publication where they basically spell out a small, I think it was just two pages worth of guidance for "here's what we want our AI to be, here's how we want it to treat people. We want it to be helpful, honest, harmless." And then they've devised this sort of self-correcting mechanism whereby they're able to use the model to critique its own performance according to its principles and suggest improvements, ways it might have been better and more in line with its principles. Is that kind of a paradigm that you're developing internally as well? How deep does this constitutional concept run?

Flo Crivello: (36:49) Yeah, 100%. It runs super deep. So we wrote down these principles. We put them on our homepage as a way for everybody to be able to review them and for us to hold ourselves to them. And then we use them at every level of the company. We use them when we make high-level strategic decisions. Is that aligned with our principles? We use them when we make lower-level product decisions, like, "Hey, what do the principles say here?" And then, yeah, we use them when we train the AI. So we fine-tune the AI on data. We use RLHF. We use these principles to do each of these things.

Nathan Labenz: (37:26) One thing that I thought was really interesting was principle number five, comfort with ambiguity, which has historically been a tough one for any computer system. Obviously, with the language models, we're getting a lot of progress coming our way for free, so to speak. But I thought you had some really interesting comments when we were speaking about this, about how you want the tool to be able to do things for you. You really want to get stuff done. But you also have to be mindful of context and understand how confident are we that we're about to do the right thing and what would be the cost of making a mistake. So tell me about that part of the paradigm.

Flo Crivello: (38:12) Definitely. So the image we always use internally about this comfort with ambiguity is we say that Lindy takes a message to Garcia. And "taking a message to Garcia," that's an essay. It's a very famous essay. It's about a war between the US and Cuba, and there was an American general who wanted to get a message to the head of the Cuban insurgency who was hiding somewhere in the mountains, Garcia. He takes a letter, and he goes to a soldier, and he hands the letter to him. And he's like, "Take this message to Garcia." And the soldier asks, "Who's Garcia?" And the general takes the letter back, goes to the next soldier, "Take this message to Garcia." And the soldier asks, "Where is Garcia?" And then the general takes the message back, goes to the third soldier and goes like, "Take this message to Garcia." And the soldier goes, "Done." Right? And so that's what we're aiming for. So again, the concrete example that I gave just earlier applies here: "Send an email to John." You don't want your assistant to ask you who John is if you're meeting with John day in, day out. Now, one of the other founding principles is reliability. Lindy doesn't screw up. And so there is a little bit of a tension between reliability and comfort with ambiguity. If you take too much initiative, you may screw up from time to time. And so given the choice between both of these principles, reliability always wins. We never sacrifice reliability. And the top heuristic that we use to decide, "Hey, can you feel comfortable to do this or not?" There's two factors that we use. We use perplexity, and we use the stakes of the decision that Lindy is making. So the perplexity is, "Hey, how certain are you that this is the right thing to do?" And then the stakes is, assuming you screw up, how bad is it? So for example, I will not take the initiative to send an email and fire a judge. Like, whoa, whoa, whoa, assuming this is the wrong thing, how bad is it? It's pretty bad. If you're firing someone you're not supposed to fire, it's pretty bad. So we use both of these factors and Lindy uses both of these factors to decide whether to move forward or whether to ask for confirmation. One other heuristic we use generally is, is the action that you're taking read-only or is it read-write? Read-only actions: "I'm going to go ahead and search your email." Or for example, if I go, "Hey, find me a way..." I'm moving apartments right now and I have cats, but I'm moving. And so I would ask Lindy, "Find me a way to move these cats." And like, "I'm going to take a plane, what are the requirements?" Lindy could also go like, "I'm also going to research online options for cat moving companies." Right? So there's no risk involved in that. Read-write actions: "Oh, I'm going to send an email on your behalf. I'm going to make a purchase. I'm going to do all of that stuff." For read-write, basically, Lindy always asks for confirmation.

Nathan Labenz: (41:04) Interesting. Okay. So I was going to ask, first of all, about just the confidence when it comes to moving forward on a particular action. Is that something... I haven't actually had a chance to dig into this yet, but I had heard somebody say that with GPT-4, OpenAI was no longer returning the log probs for the top... Historically, they've returned the log probs for the top five most likely tokens at least, so you could see what the leading options were via the API. That's how I would assume you would do something like this, assuming you're using GPT-4. But I had heard that that function maybe wasn't there anymore. So do I have that wrong or is there another strategy that you're able to use?

Flo Crivello: (41:46) No, that's right. That is another downside of using GPT-4 for us. My hypothesis is that OpenAI is removing this in order to remove the ability of competitors to distill their models. Because you can use this perplexity to actually build your own GPT-4 out of GPT-4. And I don't think OpenAI is too happy about that. It sucks though. For our use case, it really sucks we don't have access to this information anymore. And that's another reason why we are looking into building our own model.

Nathan Labenz: (42:15) Yeah. So for the moment, what else do you have available to you in the absence of the log probabilities?

Flo Crivello: (42:22) We've trained another perplexity model. It just takes a prompt as input and the context and takes the action that you're about to perform as output and tries to figure out, "Hey, does this really follow from that with high certainty?" And we're training that model right now.

Nathan Labenz: (42:37) Gotcha. Okay, cool. That's interesting. And then on the action side, the main approach is just to present the action to the user for confirmation, which makes a ton of sense, obviously. You think there's potential there also for a sort of ensemble of models where it sounds similar to the Perplexity side, I could imagine you having a good judgment model that comes in and says, Is this a good idea? Or should I hit the brakes before the more base action model does its thing?

Flo Crivello: (43:11) Yes and no. We do compose models together, or rather agents and workflows together. But that complexity and those guardrails are just something that's present every step of the way. We constantly apply guardrails, and we have many of them. And again, not screwing up and reliability is one of our top priorities. We do compose models and workflows, which I think in a really good way. So for example, you can go, Hey, when I ask you to send a contract, I want you to do that via DocuSign, because there are a bunch of alternatives out there. So go to DocuSign, generate the contract, I have templates, and then send them by email. That's what I mean by send a contract. So now I can be like, Hey, send an NDA to Bob. And then I can go, hey, when I ask you to onboard a vendor, what I want you to do first is send them an NDA. So basically that onboard a vendor workflow now uses another sub workflow, which is send a contract. And you can end up building an entire constellation of workflows like that.

Nathan Labenz: (44:18) Very interesting. Kind of two part question then next. How far do you trust this today? If you say, go book me a flight and just be confident you're going to get a flight that's going to satisfy your need? What kind of percent hit rate do you think you would have on something like that? And what about actions that can't be taken via API? I actually don't know if you could, I know you can search for flights via an API, but can you buy a flight via an API? And whether you can or can't, what happens when you need to execute a transaction online that you just can't do with generated code?

Flo Crivello: (45:00) I trust it quite a bit because I think we are doing a pretty good job with the guardrails, and it's asking me for confirmation before doing anything, and it's doing a pretty good job at taking my preferences into account. So yes, I do trust it to book flights. Hey, Miami to SFO is going to send me options. I'm going to be like option two, and then this is going to take care of everything else for me. That's amazing. The API thing, first of all, you would be shocked to see how much you can do via APIs. You can really do most things. From time to time, we will run into something that you cannot do, and that's not available quite yet, but we're working on that. We are building also a web browsing agent. So we're going to have that fallback where it's like, Hey, if you need to use the web and not an API, you will be able to do that. And we're making fast progress on that as well.

Nathan Labenz: (45:49) You showed a little bit of the product and the user experience earlier. How do you think that evolves? Does it stay like a single text box that you can interact with anywhere? Or what is the, Are we going full HER here? Or what do you kind of envision my procedural daily use of this tool being over time?

Flo Crivello: (46:13) Yeah, I think it's going to have a lot of contexts and a lot of incarnations, and we actually view it as one of our key reasons to exist as a company to build these contexts. We basically want to be to large language models as the PC was to the CPU. The PhDs did it. They give us this amazing thing that works really well. It is not an end product. It's a miracle really that large language models are so usable today with just the text field, but they can go so much further than that. If you package them up, you give them access to the right tools, the right applications, the right data, and you give them the right front ends. So, yeah, we don't think of ourselves as building a text box. We think of ourselves as building that holistic product with all of these formats. So, you will be able to send it emails, talk to it on Slack, invite it to your meetings, send it voice memos. Eventually, we do want you to be able to give it a phone call and it will actually respond to you. So to answer your question, yeah, it will do all of these things for you.

Nathan Labenz: (47:12) You don't have to answer this question, just calling back one of our earlier episodes that we did with the founder and CEO of Replika, what's your stance on erotic role play with Lindy?

Flo Crivello: (47:23) That's not a cool job to be done.

Nathan Labenz: (47:28) We'll have to, we'll maybe see what jailbreaks your early users might be able to pull out of it. In all seriousness, you do lean on the safety work that OpenAI has done under the hood. And it'll be interesting to see, too, how people choose to use something like an OpenAI, which, for all the jailbreaks and whatnot that we have seen online, they've done a lot to bring it under control and have made a ton of progress versus just this last week also, we had the sort of LAMA release from Facebook and then the Alpaca fine tuning of that coming out of Stanford. And then there was this brief moment where it was like, Oh, it's just like Text DaVinci 3. And then it was like, Oh, no, it's not. We're taking it down based on feedback from the community of problematic uses. I don't know if there's a question there, but I wonder about the degree to which training your own models opens up a whole can of worms in terms of safety and edge cases and just blind spots that you maybe don't have to worry about as much if you use such a well established provider. Any thoughts on that?

Flo Crivello: (48:40) It does open that kind of window. And to me, that's a whole other broader conversation around AI safety, which is, look, I'm very happy that OpenAI is taking safety seriously enough, though even then, they can't really stop jailbreaks from happening, and they happen all the time. And even if OpenAI was doing a perfect job, which they are not, you can't really stop other people from building their own models. And so whether we want it or not, we all are going to have these models out there, and we all are going to have them do basically anything any human wants. So I do think that is something we're going to have to grapple with as a society.

Nathan Labenz: (49:17) Yeah, no doubt it's coming at us quick. Other things, obviously, that we're going to be grappling with in the wake of OpenAI, along with a professor from Penn and another researcher, I believe, from Open Research, published a study of the anticipated labor market impacts of large language model technology. You can read their charts and estimates in a few different ways, but to me, one big bottom line is that it seems like they're setting a lower bound of around 25% of all work is kind of what they would seem to suggest is the minimum amount of work that language models could ultimately take on, especially as they're equipped with all the surrounding tools which you are building. So I'd love to hear maybe your thoughts, first of all, on how do you assess that report? Does that 25% number seem low, high? And what do you think it's going to look like for us to adjust to this world where we have virtual employees? As awesome as that sounds, a lot of people are pretty worried about what that's going to mean for society.

Flo Crivello: (50:27) I think there are two time horizons, right? There's the next five years or so, and then there's AGI and ASI. AGI and ASI all bets are off. I can't comment on that. No one knows what's going to happen. Certainly, I think it's going to be quite disruptive. In the meantime, I'm not too worried about job losses because that's just something people have been worried about forever. And I think it's just a failure of imagination for humans to realize human needs and wants are infinite. And as you free up humans, because now some tasks are automated, you actually free them up to do other things that only they can do. So look, I think something like 90% of the active population in the early 1900s were farmers. Today, it's something like 5%. So it's a huge transformation and there is very low unemployment, at least in the US. So not a huge concern. I think it's Malcolm Grayson who's very fond of saying, every quarter, I believe, there's some official numbers that come out with the unemployment numbers. And so it's always a net number. It's like this quarter, x thousand jobs were lost or created, but it's always the net number that makes the headline, but the gross numbers are huge. It's always millions of jobs created and millions of jobs lost, and then there's a tiny difference that's the net result. And so I think that the bottom line here for me is that economies are a lot more resilient and elastic and dynamic than people realize, and they can reconfigure themselves extremely quickly. So I'm not worried about that. I actually am very excited about growing the GDP by 20% and making people 20, 30, 50% more efficient.

Nathan Labenz: (52:09) So one thing that's not there, I wonder if you have a take on this, is the sort of Keynes' vision from 80 years ago or whatever now, maybe even more, where he famously projected that by this point in history, we'd all be working a lot less. The idea was supposed to be that we'd enjoy our material comforts and maybe only have to work a couple of hours a day. I think it was 15 hours a week or something, which was the long range forecast. Obviously, nothing like that has happened. It doesn't sound like you foresee that either. It sounds like you more see people continuing to work hard for the foreseeable future, but just being more productive because they're able to delegate more stuff to AI. Is that right?

Flo Crivello: (52:53) No, I think Keynes wasn't too far off. I think it was just too early and too eager in his predictions. But if you look at hours worked per year, per person in the US, it has been going down steadily over the decades. I think we work 30 or 40% fewer hours now than we did 100 years ago. So, no, I think that's also part of the solution. At some point, perhaps you reach decreasing returns and perhaps you're like, Look, we are actually filling up most of our needs, and a lot of people are actually deciding to work less hard in order to have less and just to have more time. I think that is going to be part of the solution certainly.

Nathan Labenz: (53:30) Well, I'm hoping for some of that. It definitely seems like more leisure would be a good thing. And I'm personally very much looking forward to the era of robot servants, both digital and potentially even domestic robots. And who knows what else, too. So I know you're super busy. You've got a launch that you're doing, and I appreciate you coming on to talk to us about this. Just a couple of kind of closing questions. One is, how do

Flo Crivello: (53:59) people find you? How do

Nathan Labenz: (54:00) they find the product? How do they sign up? And then we're going to do a little, as the episode releases, we'll do a little promo online where we can get a couple of people off the waitlist and into early access. But just for the general audience, where do they go to sign up and what do you think the timeline is to really ramp up the user base?

Flo Crivello: (54:22) You can go to lindy.ai to sign up to the waitlist. Right now, it's a very limited private beta. But yeah, any Cognitive Revolution listener, go there and say in the notes of the form that you're coming from Cognitive Revolution and we're going to prioritize you. The timeline is we are onboarding new customers every week, and I think we will reach general availability sometime this year.

Nathan Labenz: (54:45) Sometime this year, it could be obviously next month or it could be nine months from now still. What do you think are the big things that you need to iron out or the sort of, do you have a short list of key problems that you need to solve before you can go really wide with this thing?

Flo Crivello: (55:01) Definitely. I think the reliability of it is one thing that right now it's working well, but to me it's very similar to self driving where there's one fatality every million miles or whatever. So look, you actually need a lot of data. You need three million miles or maybe a factor of a hundred here to know whether you're actually doing a good job. So here it's the same thing. We need a lot of data before we're actually comfortable putting that in the hand of a lot more people. We're also just learning so much and so fast about what people want to use this kind of system for. And so as we're learning, we're adjusting our plans, and we want to make sure that we've built and crafted an amazing product before we put it in the hands of the general

Nathan Labenz: (55:39) population. What AI tools are making an impact in your life today? Stuff that anybody can go try, but that you have found yourself going back to?

Flo Crivello: (55:48) Apart from Lindy, GPT-4, ChatGPT are pretty huge for me. The reveal blog post that I'm releasing as part

Flo Crivello: (56:00) Of this announcement was written with GPT-4. I recorded myself. I'm not a terrible writer, but I just hate it and it takes me a long time to write one thing. It takes me 4 or 6 hours every time. This time what I did is I just recorded myself and I rambled for 15 or 20 minutes in the most unstructured way imaginable. And then I transcribed this using Whisper and I fed that to GPT-4 with some lightweight guidance on the style I'm going for, which is colloquial, minimal, simple. And I'm like, write a blog post with these guidelines according to that transcript. And it did it almost first shot, and it took me only half an hour to write the blog post instead of 6 hours. So that's a huge one for me. There's also, I think it's YC Search or something like that. Some people indexed and embedded all the YC videos and blog posts, and now you can search them, which is pretty awesome. So you can be like, how should I think about hiring? Or how should I think about remote? How should I think about coworking spaces? And you get distilled startup wisdom in 5 minutes. So that one's been pretty awesome. What else do I use these days? I use Whisper quite a bit. There is a Whisper app called MacWhisper. Unaffiliated, I just think it's a really well built product, and I use it quite a bit for voice memos and recording myself.

Nathan Labenz: (57:26) Interesting. You're pretty consistent. I mean, Whisper is one that hasn't come up a ton, but is awesome. But it is interesting. We're in this moment where for all of the insane proliferation of apps and Ben's Bites is dropping dozens per day on people, still, in this show, when I have asked people what they use, it has been pretty consistent that the main thing is ChatGPT. There's been mention of a number of other things, but mostly it's people that are like, yeah, I mostly use ChatGPT, and then maybe one or two other things as well. So I wonder what ultimately that means. It doesn't seem like it bodes super well for the first wave of applications anyway. But obviously the second wave, more the kind of thing that you're building, there's just a lot more to it than most of the stuff that we've seen so far. So I do think that will probably change. But I don't know, it's interesting. It doesn't seem like there's maybe that much room for 1,000 AIs in most people's lives.

Flo Crivello: (58:31) No, I think to your point, and this is one area where AI applications diverge from commonly received startup wisdom, is that I think a lot of these AI applications are a bit too narrow. So again, it's like the YC search thing. It's like one tool to search YC videos. If you look at Product Hunt, there's hundreds of AI tools coming out every week right now. People don't want to have to manage hundreds of tools and hundreds of logins and hundreds of links. They want one big tool. And so I think right now, ChatGPT is the one thing that comes closest to this one big tool. But I think, again, tools like Lindy are suddenly another attempt to create something that is very broad and general purpose.

Nathan Labenz: (59:11) Is there a limit on what this type of thing can do? I mean, you're positioning it, for starters, as kind of an AI assistant, but is it also going to become an AI accountant and an AI lawyer for you as well? Is there any limit on how much this thing can do over time?

Flo Crivello: (59:34) You know, I think of it as Amazon or Craigslist, right? And these things are getting unbundled. Right? Amazon's vision was the everything store. We sell everything. But at the end of the day, we're not saying this is literally going to kill all software. Right? There's going to be point solutions for very specific needs, just like Amazon does. There's also Zillow if you want to buy a house, and there's platforms if you want to buy a car, and there's platforms if you want to buy a dress and furniture. There's certain verticals that have very specific needs. I'm going to use these verticals when I buy a car or house or dry-select some furniture. But by and large for everything else, I use Amazon. So that's my mental model here. Lindy is Amazon and sure, when you want to prospect, you're going to use a CRM or ZoomInfo and whatnot. But I think people are going to have one big text field in which they type whatever stuff they want to get done.

Nathan Labenz: (1:00:26) Okay, here's a hypothetical question. Let's imagine in a world not too long from now, a million people already have the Neuralink implant in their skulls. Now, if you get one, you will be able to think and communicate directly via your thoughts with Lindy and the computer in general. You'll have essentially thought to text. Would you be interested in getting one in your own head?

Flo Crivello: (1:00:59) Not at first. The privacy and the security issues here are problematic. Though I think I would wait a very long time before getting something implanted into my skull, probably 10 years or more.

Nathan Labenz: (1:01:12) Yeah, you only have one skull. But answers on that have also been surprisingly varied. There are some early adopters among our previous guests. Okay, last question. You're setting out with a really ambitious vision to build what you hope will be a big part of our future lives, certainly our computing lives. If you could zoom out even farther than that, what are your kind of greatest hopes for and greatest fears for the next handful of years as AI permeates all parts of society?

Flo Crivello: (1:01:44) I do hope we can get people rid of menial work. I think we're wasting so many humans right now. Humans are AGI, right? We're seeing so many humans do work that they shouldn't be doing, that robots should be doing. Mind-numbing data entry work and stuff like that. And you shouldn't have to spend time going back and forth to find time and playing with people's schedules. That should be the job of a robot. So I'm very hopeful for that for sure. Any big fears? I think misinformation can become a problem. Mostly, I have my eyes to over the very long term existential risks. I do believe there is a low but non-zero percent chance that things go very wrong with AGI.

Nathan Labenz: (1:02:28) Would you venture any remedies, prescriptions, regulations, guidelines? And do you have any sense for how we can minimize that risk?

Flo Crivello: (1:02:42) That's very uncharacteristic of me, but I think we need regulation. We don't trust the private sector to self-regulate for basically anything that is touching safety. Airlines are extremely regulated, banks are extremely regulated, buildings. When you build a new house, that's extremely regulated for safety reasons. We need to do the same thing with AI.

Nathan Labenz: (1:03:06) Well, it's going to be an interesting few years to say the least. And looking forward myself to getting off the waiting list and getting into lindy.ai, the AI assistant. Flo Crivello, founder and CEO, thank you very much for being part of the Cognitive Revolution. Thanks a lot, Nathan. Omneky uses generative AI to enable you to launch hundreds of thousands of ad iterations that actually work, customized across all platforms with a click of a button. I believe in Omneky so much that I invested in it, and I recommend you use it too. Use CogRev to get a 10% discount.

Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to The Cognitive Revolution.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.