In Search of Truth w/ Aravind Srinivas of Perplexity AI

In Search of Truth w/ Aravind Srinivas of Perplexity AI

Watch Episode Here


Read Episode Description

We talked to Aravind Srinivas about competing with the tech giants, Perplexity’s product philosophy and obsession with maximizing value delivered to the user per unit time, and the features that make Perplexity unique.

(0:00) Preview
(1:31) Introduction & Context
(08:06) Aravind’s vision for Perplexity as the world’s most truth-centric company
(9:29) Raising the average IQ of the world(12:00) Aravind’s favorite tech entrepreneur
(15:12) “Anthropic” - Building your own model vs user experience
(33:00) Why is Perplexity free
(43:00) Talking shop: Perplexity’s tech stack and speed
(1:06:00) Perplexity’s stance on guardrails on search
(1:19:48) Forecasting weakly general AGI: The four criteria
(1:28:00) Aravind’s perspective on OpenAI and the future of AGI
(1:39:00) The Cognitive Revolution’s rapid fire questions: Neuralink implant? Most used AI tools? AI hopes and fears?

*Thank you Omneky for sponsoring The Cognitive Revolution. Omneky is an omnichannel creative generation platform that lets you launch hundreds of thousands of ad iterations that actually work, customized across all platforms, with a click of a button. Omneky combines generative AI and real-time advertising data. Mention "Cog Rev" for 10% off.

Twitter:
@CogRev_Podcast
@AravSrinivas (Aravind Srinivas)
@perplexity_ai (Perplexity)
@labenz (Nathan)
@eriktorenberg (Erik)

Join 1000's of subscribers of our Substack: https://cognitiverevolution.substack.com/

Websites:
Cognitivervolution.ai


Full Transcript

Transcript

Transcript

Aravind Srinivas: (0:00) We need to build AGI so that humans can just go back to living, just live a nice life. Not everybody needs to work so hard. A lot of people don't appreciate what's going to happen to them. You don't have to do a ton of work. It'll almost be like you get to live the life of a millionaire or a billionaire. You right now are already living a higher quality life than the president of the United States 50 years ago. You just have access to technology that they could only dream about. So technology is the biggest leveler to making humanity equitable. A lot of people don't get it. And if intelligence is in abundance, you no longer have to compete to be the highest IQ person in your class or something like that. You can do stuff that's interesting and creative to you and learn from the AI.

Nathan Labenz: (0:50) Hello, and welcome to the Cognitive Revolution, where we interview visionary researchers, entrepreneurs, and builders working on the frontier of artificial intelligence. Each week, we'll explore their revolutionary ideas, and together, we'll build a picture of how AI technology will transform work, life, and society in the coming years. I'm Nathan Labenz, joined by my cohost, Erik Torenberg. Omneky uses generative AI to enable you to launch hundreds of thousands of ad iterations that actually work, customized across all platforms with a click of a button. I believe in Omneky so much that I invested in it, and I recommend you use it too. Use CogRev to get a 10% discount. Aravind Srinivas is the founder and CEO of Perplexity AI, an AI-enabled search engine which he aims to build into the world's most trusted information service. To give you a sense of the new possibilities and potential for change that AI is creating, consider Google's total dominance of the search market for more than 10 years. Google earned $282 billion in revenue in 2022, a number that is still growing at 10% per year, and nearly $60 billion of that was profit. That's more than $1 billion in profit per week. But despite this incredible scale, profitability, and continued market growth, even the most well-resourced and determined competitor, Microsoft, has barely chipped away at their position, and Google has maintained close to 90% market share. Yet into this market enters Aravind and Perplexity. With a $15 million venture capital round and just 10 or so employees, Perplexity has built a genuinely novel search experience which has meaningful advantages over Google. And despite the fact that Perplexity does not have the same privileged access to the latest OpenAI models as Bing, it continues to hold its own in terms of overall usefulness. We talked to Aravind after the new Bing launch but before OpenAI dropped prices 90%, about how he thinks about competing with the tech giants, Perplexity's product philosophy and obsession with maximizing value delivered to the user per unit time, the features that make Perplexity unique, how he thinks about managing language model costs, managing user identity, collecting user feedback, ensuring user safety, and how Perplexity might eventually monetize. We also covered his expectations regarding AGI, why he thinks AI will drive both wealth inequality and greater equality of access and opportunity, the risks of misaligned AI, and the trust that he has in Sam Altman and OpenAI leadership. I came into this conversation already impressed with Perplexity the product and came away just as impressed with Aravind the entrepreneur and thinker. I hope you enjoyed this interview as much as I did. Aravind Srinivas, welcome to the Cognitive Revolution.

Aravind Srinivas: (3:53) Thank you for having me here.

Nathan Labenz: (3:55) Really excited for this conversation. As anyone who follows me on Twitter knows, I am a big fan of what you're building at Perplexity and have been raving about it online and what an awesome search experience it is. You've articulated the mission for Perplexity as becoming the world's most trusted information service, and I'd love to just start off by asking you, what does that mean to you? What does that mean to you in two years? What does that mean to you at the end of the decade? What is Perplexity going to be like as we go into the future?

Aravind Srinivas: (4:26) These are great questions, and I think about these every single day. To start off with, our core founding team has a lot of academic background. And the first thing you're taught when you're writing your first ever research paper is don't ever say anything in the paper that you cannot actually cite. You should be able to reference whatever you say. The reference should either come from some other research paper, or it should come from an experimental result in your own paper. Anything else that you say is an opinion, and it's not a fact. And that was very powerful. It's still stuck with me even now, and that's sort of why we were the first to put out a citation-powered search. Right after ChatGPT came out, I think literally a week or week and a half after it came out, we put out the first version of Perplexity Ask, which literally just combines Google and GPT-3.5 together and takes the top few search results and treats them as links that it can cite, and then gives you this really cool, very short three-sentence summary of what the answers should be. But only tries to say stuff that's already been presented in these links. Of course, there are some hallucinations that might still happen in certain long-tail scenarios, but we are working super hard to reduce that and just try to be as truthful and honest as possible. People loved it. People loved the fact that they can verify what the model is saying. If the model makes mistakes, they have the power to still go and check, read the links for themselves. And if they're lazy and they already trust what we're saying, they don't want to go click on the links, and they can just read the answer on Perplexity. And so this was our first ever release, and we were super happy with how it went. And of course, the whole world was like, oh, I just want a conversational thing. I want a chatbot. I need to be able to chat. Chat is the new interface. We heard that loud and clear. Everybody kept asking us to make it a chatbot, but we did not succumb to the pressure of just making it an actual chatbot. We thought through from first principles and said, why would anyone even want to chat with a tool like this? What does it mean to even chat with a search engine? It's not a person. And I'll connect this to how Bing and Sydney has been implemented. It's pretty different. We thought through this and said, people only want to chat with a search engine if they actually want to ask follow-up questions. And Google sort of implicitly does this for you. It has this "people also ask" thing at the bottom of the search result, and there's always suggested searches, suggested questions. It's sort of like giving you the ability to follow up. And so we did that first. We added related questions and we saw a lot of people clicking on that. And then we learned from that, and then we were like, okay, let's do an actual follow-up. The conversation should support an actual follow-up question. And then that was our entry into chatbots. But it doesn't look like a chatbot. The UI will not look like a chatbot, and that was intentional. We don't want to have these left-right bubbles and chat interface at all. For us, you only chat if you want to dig deeper. And so it's meant to be a knowledge and information service, not an entertainment or chatbot kind of service. So that is Perplexity for you. I've articulated this company before, saying this in public, that we want to be the world's most knowledge and truth-centric company. This is inspired by Bezos saying Amazon should be the Earth's most customer-centric company. In fact, in every customer chat that you have with Amazon customer service, after it's over, they say, "Thanks. What can we do more to become the Earth's most customer-centric company?" or something at the bottom. They say that even now. So that's how much they care, and it's an abstract thing. It's not even a detailed thing where, what does it even mean to be this? So to us, what it means is if someone else wants to build a truthful and accurate source of answers, any other service, any other product, they should look at Perplexity as an example. That's what it means to us. And honestly, I was so surprised. The news came out that Elon Musk is building a ChatGPT rival, and it says he wants to make it trustworthy. He wants to make it factual. I feel like we have gone further than any other company to solely focus on that mission. And I hope that whatever we did is as some level of inspiration for him to do it in his own way. If people want to learn something in a minimal amount of time and want to trust the source from where they're learning it from, I want that to be Perplexity. The IQ velocity should be high. The delta IQ divided by delta time, we want to maximize that for you.

Nathan Labenz: (9:48) That's really interesting. I think this is becoming a trend in software products generally, where if the 2010 decade was about maximizing time on site, maximizing eyeballs, attention, and then ultimately monetizing an audience, flipping that around and saying, how can I maximize the value given per unit time? Time is cliche to say, but definitely still true, time is the one resource we're not getting any more of. And trying to minimize that denominator makes a ton of sense to me. I think you're ahead of the curve on that, but I do think you're also leading what is going to be a bigger and bigger trend.

Aravind Srinivas: (10:34) To add on to that, I think if we can just do whatever we did earlier in a shorter span of time, we just get more iterations to learn. And that's true for both biological neural nets, that is us, and artificial neural nets. The more epochs you do, the more iterations, the more training updates you do, the smarter you get. And so if I make the amount of time you need to learn something much smaller, you just learn more in the same amount of time. And so you are a much smarter person, and that's great for the world. And if more people can do that, then it's even better. The average IQ of the world increases. And Sam Altman said something like this, where the amount of intelligence will double every year or something like that. So we want to accelerate these kinds of things.

Nathan Labenz: (11:26) Certainly, raising the sanity waterline has been something a lot of people have thought about for a long time and tried to figure out how to do, and I think these tools are starting to bring that to life in a very productized way. So you mentioned your academic background. You also spent about a year, if I understand correctly, at OpenAI as a member of the research team there. And then you have obviously jumped off to do your own company. I'd love to hear a little bit about why you decided to do that, what inspired you to do it.

Aravind Srinivas: (12:01) I always wanted to be an entrepreneur. A lot of people's favorite entrepreneur is Steve Jobs or Elon Musk, but for me, it was Larry Page because of the same background. He was a PhD student and turned his academic idea into a company. I was always interested in coming up with a company that wanted to do search. In fact, the first ever idea I pitched to our seed investor, Elad Gil, was, "Hey, the best way to disrupt Google is to make search work on Glass and you don't even have to type anything. You just look at it and you get the results." And he was like, "This is such a bad idea for 2021. You should wait for a little bit." And he ended up being right. But who knew that the best way to disrupt Google was to basically make people not click on links? We were not thinking about that. So I wanted to do entrepreneurship, and for a long time, it was just an academic thing. It was becoming harder and harder. The results were getting better, but it was still considered, "Oh, this is cool research." But things started changing in 2022. GitHub Copilot happened, and it was a successful product, and not just a product, but made a ton of revenue. When they announced that it's a paid version, a lot of people signed up on day zero, and I heard about that. And then I also heard about Jasper making a ton of revenue. I also heard about, I mean, and I got to see DALL-E becoming a big hit. And DALL-E was a big moment for me to realize this is no longer a researcher's world. I could see how researchers build a model, but the way it was taken and given to the public, all the effort that was put into marketing and then making it a very easy to use product, reducing the barrier to entry. Everybody became an artist so easily. I was able to create cool art for the first time. These things were no longer a researcher's thing. And I realized, okay, now is, if at all there's a time to change your career from a researcher to an entrepreneur, it's now. So that basically was why I wanted to do a company. The other thing is also that you could get a lot more done as a team than as an individual. And I think at OpenAI, I was an individual contributor researcher. So obviously, you have to be incredibly top notch to do everything on your own. And there are people who are better than me at OpenAI at doing that. And on the other hand, I felt I had a bunch of skills that were better suited for entrepreneurship, like leading a team and getting them to get things done. Of course, I didn't prove that yet, but I felt I had it. So I wanted to take a risk and try it out, and that ended up being Perplexity. So I'm super, it's a lot of good fortune. It's not like I planned it and it all worked out or something. A lot of things went my way. Do we plan to build our own models? I think the answer to that is, again, go back to some first principles. Now, we are a product company, and you guys use it. Any other person outside of AI also uses it. To them, it doesn't matter if we use GPT-3.5 or 4 or o2 or our own models or Anthropic's models. Nobody cares. They just want the answer. Deliver the best answer in the shortest amount of time and make the user experience amazing. They don't care at all. VCs care. If they want to make an investment, they think, "Oh, you need to have a technical moat, checkbox." Right? So they care. And then maybe some people, if we try to recruit, they care because they think, "Oh, if this company can't train its own models, it's just another Jasper or Copy.ai, and it'll become commoditized and they won't have any differentiation." So those are the people that actually care about these things, but the actual user doesn't care. So as a company, who should you obsess about? You should obsess about the user. So that's why we started off with GPT-3.5. It was the best model in the market. Now, there is a reason to build your own models. The primary reason is not to impress the VCs. The primary reason is to reduce the cost per query. As you know, Google's not doing search in a different way, not because they don't want to do it in this manner. It's because their search volume is like billions of queries a day, and you cannot use an LLM for every query if you're Google. You're just going to burn like billions of dollars a year for that. And so we thought it makes sense to explore this because our search volume is much lower. But as we grow and as our search volume gets higher, we are obviously incentivized to bring down the cost per query, right, until we even figure out how to make money. And in order to do that, we need to train our own models and control the stack so that we can reduce the cost through good engineering. So that's why we are invested in that. We already silently do that. It's just that we don't make a big fuss about it, but there are some queries that you try that are probably running through our own models too. It's just that it takes a long time for you to match OpenAI. And they're also doing a lot of great things in terms of making the cost for companies using their APIs even lower by innovating on the model, innovating on the inference stack. And the whole inference stack that they've built to handle this level of traffic is super hard to match right now. So we're going to play the long game here and be pragmatic, and we're also already foreseeing a future where the prices of these GPUs comes down when the next generation GPUs is available, and the pricing of these APIs will go down even further. And there'll also be a different pricing tier in the future where pricing is not based on consumption, but something else. So you want to be adaptive and dynamic in this world. You don't want to think, "Oh, my model or OpenAI, there's no other alternative." Right? That's the old school 2021 style thinking. I think right now, people are very adaptive and flexible and do what's best for the company and the user, and other things don't matter.

Nathan Labenz: (18:45) I just posted a big thread on Twitter that was inspired by the as yet unconfirmed foundry pricing from OpenAI, which kind of indicates that they're moving to a dedicated compute model. And it's not clear from the leak how much throughput that gives you and how much of a discount that might be. But that is definitely a little bit of a window, it seems, into what is coming. It's not just going to be tokens.

Aravind Srinivas: (19:13) This is just the beginning. They're going to try out further iterations on this. The models are going to improve even further, and it's going to get cheaper to serve these models for them because of NVIDIA's work on the hardware and the inference stack, and the throughput's going to get better over time, the cost of hardware for intelligence is just going to come down. Right? So making a bet based on what it is right now and building a company around that is just really risky. There are LLM companies that raise hundreds of millions or like 50 million or whatever, and commit to spending 80% of it on the cloud in order to train their own models. And then by the time they end up matching GPT-3, OpenAI is already offering the next generation 3.5 or the next generation even further than that, and coming up with new pricing tiers. So why are you going to do that? You hardly built a product, you don't even have a model that's as competitive, and you don't have the inference stack, and you already spent like 80% of your money. So it's a bad position to be in. Right? So we decided this is not the best thing for us. For us, the differentiation is going to come from building a great product, building good user experience around it, and making sure that people can use it reliably, and then figure out how we can get our differentiation in different parts of the technical stack. And there are things that you can build that OpenAI also doesn't build because you are building a different kind of product, right? So for example, we combine search and LLMs together. There's a lot of work for us to do not just on the LLMs front, but also on the core search indexing and ranking. That's not going to go away anytime in the near future, by the way. Having a really good index, having a really good snippets for every page, and having a really good ranking model gives you the flexibility to work with a reasonably bad LLM. A great LLM can sort of make up for a bad ranking, but a good LLM will still need a reasonably good ranking. And there's just so many moving variables here to work with. And so when you're building an engineering stack around this, you end up solving problems that OpenAI would not be solving, and that gives you a new kind of moat that puts you ahead of other companies that likely will follow you up in this space. Right? So that's our thinking around this. And one more thing that inspired me a lot was what Jeff Bezos said in an old interview when people kept asking him, "You say you want to deliver everything in one or two days, and then you end up spending a ton of money for that, and therefore you're never going to make any profit. And then you end up pricing your stuff on your site at the lowest possible price. So why are you not caring about being profitable?" And the answer he gave was super interesting to me. He said, "Our profitability is not our customer's concern. Our customer's concern is getting the product in one or two days at the best price in the market." And I thought that was super insightful. Our profitability is our concern, our investors' concern, not the user's concern. So if you decide to obsess about the user, you shouldn't obsess about which model you're serving them. That's something else. You bring down the cost in a different way. That's not, but don't hurt the user experience for that. So never serve a bad model just because you can spend less money and run the company longer, because you're going to compromise on the user experience, and then nobody will use your product then.

Nathan Labenz: (23:07) The importance of access to kind of frontier models for your business. As I was thinking about your position and obviously, new Bing is out there. From what I understand, it's the first GPT-4 class model to kind of hit a public facing product. And as much as you have a relationship, having worked at OpenAI, I'm guessing that that doesn't trump the $10 billion investment and close partnership that OpenAI is building with Microsoft. So it seems like you're going to, at best, be a fast follower in terms of access to the very latest models. But if I understand what you're saying correctly, you kind of think that that is something that you can overcome by a holistic approach to the customer experience. You don't think that the delta between, say, 3.5 and GPT-4 or Prometheus or whatever it may or may not be called, that's not so big in your mind that it kind of trumps all the other aspects that go into the product?

Aravind Srinivas: (24:19) I would say that with respect to Bing and us, there's a lot of differentiation. Number one, they built a product that's like if you wanted one service that lets you use ChatGPT and something like Perplexity together in one single unified chatbot, that is Sydney. That is Bing. You can do open-ended conversations. You can ask questions about impactful things, you get it all together in one thing. That lets the chatbot have a personality, and there's a lot of entertainment and engagement there. At the same time, it lets you pull up stuff from the search engine. If you have access to it, you should try it out. From our experience, it's much slower than Perplexity for using search-related stuff. It takes five seconds to do the browsing, but that's because they tried to build a more general product. If you're going to Perplexity and you ask it to write something in the style of Eric or something, it's not going to be able to do this. And we don't want you to come to our platform for doing that. You should go to ChatGPT. But if you come and ask questions about what's the best way to file my taxes if I'm late for 2022 already, these kinds of questions, then we are better. You come to us. We'll get you the answer. No entertainment. You got what you wanted, you leave the site. We don't want you to stay for half an hour. That's not the right objective for us. So I think Bing tried to do both together. So it's already a pretty different product, and so we are not worried about any competition there. And the second thing is we want to go beyond just the chatbot or search or answer engine. We want to be a platform where people can learn and share what they learn with other people, hence why we even implemented permalinks. You don't even need to take a screenshot for Perplexity. You can just share the permalink with somebody else, and they can go through all your questions and answers. And it's almost like you generated a dynamic Wikipedia page on the fly. And you can imagine us implementing, so if you look at popular queries on Perplexity, that's a sort of vanilla version of forums, and then we can imagine making subreddits of different topics. There's so many things we can go towards, and our goal is to do that and not just remain at where we are right now. And I want humans to work together on our platform and not just have everything AI-generated, and so that's why we implemented editing sources. We're also thinking of doing something like community notes, where people can come add comments on any answers we generate so that our answers can be refined using that. In some sense, we are actually compared to Elon Musk's new company, not to Bing or OpenAI. That's sort of how I would see it. Trying to build a more truthful platform where the best all possible sources of information are collated together and given to the users, provided all perspectives. And I think that's our goal. And I feel like that goal is pretty different from what Microsoft is trying to do, which is take on Google, and what Google is trying to do, which is survive, and what OpenAI is trying to do, which is build AGI. These are different goals from what we have.

Nathan Labenz: (27:45) In the Elon article on The Information that came out, it implied or said that he was doing this because he felt ChatGPT was way too censorship-oriented around a particular political orientation. How do you think about the challenges that come with the tension between wanting to have something that's accurate, that's truthful, but also the tensions that come with people wanting to censor information?

Aravind Srinivas: (28:15) Yeah. So that's why we have the source editing feature or curation. You curate your sources. If you don't like Washington Post, then just remove it from your answer. If you think New York Times or Wall Street Journal are too left-leaning for you, then take it away and put whatever you want in that and read your answers. Share it with whoever you think would like your version of the answer, then control your search experience. And we don't want to give you just one version of the answer. And our answers are just completely not written by us. It's purely based on the ranking model and the LLM. We have no control over what the answers. And unlike ChatGPT, it's not the answer that the AI already thinks. It's actually going and reading tens of links and taking the stuff from there and collating them and giving it to you with the citations. And if you still don't like it, you still have the power to change the citations, add more references. So I see this as almost like a tool, like a workflow tool that you use to answer questions for yourself and still be in control of what you want to see rather than blaming the AI for getting it wrong or being censored or not censored. I even tried it on a few questions that Marc Andreessen, who also talks about this on Twitter. There's one question he asked about, can AI predict race from medical images? It's a reasonable question to ask. And he says this is such a wrong idea, you're not allowed to talk about it or whatever. And you go ask this on Perplexity. It would give you the exact research article that shows you that this can actually be done, that deep neural nets can predict the race from a medical scan. And it would just say yes, it can do it. And you can go and ask the detailed version of the answer, and it would tell you exactly why. And it would give you the exact research articles in Nature or some other journals that have published these things. And we would also say maybe this is, even though there's correlation, it does not imply causation, and so it gives you all these different perspectives. And so we are, I believe that we are already not censored, and if a user thinks we are censored, then they can further go and control their search experience. So that's our position right now. Now if Google is censoring things, if Google just removes a few links from their search results, or Bing just removes a few links from their search results, and if we work on top of their index, then yes, there's a chance that we might miss out on a few links. We are going to work on that, and we're going to give users even more power where they can literally manually paste some links if they want to and provide even a few domains that they want to and read the answers according to what they seek. So we don't know yet if users really want this. There's always this thing of, oh, I want to be in control, but then you're also super lazy. You just want to know what you want. It's very hard. You can't do both things together.

Nathan Labenz: (31:38) I'm always kind of amused by how much energy goes into trying to get these language models to transgress some sort of social norm. It feels like, obviously, there's a lot of questions right now around what is here to stay? What's the real deal? What's hype? What has staying power? I do suspect that when we look back on this moment in time, at some point in the future, it'll be sort of quaint to think that people were so focused on the ability to get a little bit of wrongthink out of a language model. It seems like that'll be, at some point, pretty passe. But I like your perspective on it quite a bit. Because you were just talking about personalization. One that jumps to mind is I've used the product quite a bit. I don't believe I've ever been prompted to sign in or create an account. The upper right-hand corner. It's like the only website, I think, on the internet that doesn't have a sort of sign up now call to action. I'd love to understand how you're thinking about your relationship to users, whether it's just a broad sea of people or you're going to get into individualization. I'm sure you're well aware that one of the other interesting angles that people are taking on this is allowing you to connect your Google Drive or your Gmail history and whatever and search through your own stuff as well. Typically, that's a premium plan. But what's kind of your outlook for, generally, broadly speaking, personalization?

Aravind Srinivas: (33:08) Firstly, the primary reason that we made the product free and no sign up at all was because the competitive alternative was google.com. You don't need any work. You just have the search bar ready for you, and you just start typing. So if you want to actually compete with that, you need to offer that. And I feel like that's something OpenAI missed. ChatGPT should have been like this. The empty chatbot, no sign up, you just start typing. Of course, if you want to sign up, you get benefits like you can keep a thread of your past chats, whatever, and go back to it and so on. And we can also do that on Perplexity over time. But this current version of Perplexity that's already there will always exist, and we'll make sure it exists. And any improvements we do on the search or ranking or LLMs will benefit every free user. It's part of the Google mission that we also subscribe to, which is make it universally accessible and useful. Sign ups, come on, most people sometimes get turned off by it. As a new startup, you haven't earned the right to make anybody sign up for your service. What have you even done? How have you earned their trust? You earned their trust by providing the best possible answers. And then they trust you to, okay, I can give this company my Google Drive, or I can give them my iMessages or whatever, and you can index it and cite those messages or links and so on. But when you just made your first launch, you cannot do these things. So another thing that inspired me for doing this was Twitter. There's one tweet that Mike Krieger, the Instagram co-founder, liked, which is the two things stopping OpenAI from taking on Google and already having destroyed Google, which is the name, ChatGPT, not being great for a product. And the second is having sign ups, having user sign-ins as the necessary feature. And I think that was super accurate. I felt like it could have been, instead of being a million sign ups, it could have been a lot more if they thought through this. So as a startup, we had no alternatives, so we went for this. Now whether we'll have user sign ups in future, yes, we'll definitely do that and we are working on that. But we want to make sure there's sufficient incentive for a user to sign up. Just having threads is not enough. So they need to have more reasons to sign up, and so we are thinking through that carefully before introducing it. And it's also a lot more responsibility. Once you have a user that's actually signed in and giving you all their personal data, you need to act with a ton of responsibility, especially if you want a brand to be the world's most trusted information service. You need to think through these things carefully. I don't want to just succumb to the growth hack mentality that Facebook took here. So we haven't thought through these things, and we first want to take baby steps through the iOS app and test out a few ideas and learn from it and then try to incorporate it into the web app too. But we want to get, because the collaborative nature of the product should eventually exist, and for collaboration, we need user sign ups.

Nathan Labenz: (36:40) Yeah. Something I thought about earlier when you were speaking about kind of the community aspect is how many people have said that the best search is to go to Google and then put in Reddit as the site to search and then search on that. And it sounds like you have kind of an angle on that that you're building toward over time. I think that will be interesting.

Aravind Srinivas: (37:01) We built a Chrome extension specifically for that. When you're on Reddit, you can just use this domain and then just get all the Reddit links. You can even just suffix Reddit. You don't even need to do a site colon in Reddit. You can just say Reddit, and we would still get you the answers just from Reddit on Perplexity. So we understand, in fact, that was one of the inspirations for doing this. A lot of the questions that people come and ask on Perplexity are stuff that you would ask on Reddit, a lot of consensus-based things. And so there's a lot of alignment there.

Nathan Labenz: (37:28) Yeah. Okay. Cool. What about on your side of the identity question, especially as you have this editing and you also have the feedback, the thumbs up, thumbs down? I'd be interested to hear how you think about just collecting feedback in general. I'm kind of surprised by how unintrusive the feedback UI is. I've given this comment to a couple different entrepreneurs who've been on the show, including Sue Hale, who was our first guest with Playground AI. I was like, why don't you guys force me to tell you which image is the good one or the bad? So I'd love to hear your take on how you're collecting feedback, how much you want to push people to give it, and then how much it matters that you know who somebody is. You could probably do this with cookies or whatever, but are you kind of keeping track of which users give high-quality feedback so you can not have your feedback mechanism polluted by either just dumb people or people that are messing with you. So, yeah, tell me everything about feedback.

Aravind Srinivas: (38:29) Only about 10% of people give feedback. The reason is nobody has the time. How many people in Google, by the way, use their feedback mechanisms? Do you even use it? I don't use it much. Do you use feedback on ChatGPT? I hardly give them feedback. That's the thing. Nobody has the time for it. So we want them to. The editing links thing was mostly forcing them to do it if they really want to. And that gives a signal. If they remove bad links, that tells us this link was irrelevant to that person. On the other hand, just reporting accurate or inaccurate doesn't give you a lot of signal. Sometimes the summary is really good, but the user didn't like it. Sometimes it just missed some small part of the answer. What we actually do is mostly use it as a filter right now. We don't fine-tune the models on stuff that people marked as inaccurate. We have a few contractors who go through all the user data and label stuff for us and make sure that we can improve based on whatever feedback we got. We do that. We don't actually cross-check which user is giving which kind of feedback. Haven't done that yet, mainly because we don't actually have any user. It's all session ID based, and IP addresses keep changing. You could be in a cafe, you could be at your home, and you have different addresses, and we can't keep track of everything. So it's pretty hard to make people give you feedback voluntarily. One thing Google has done amazingly well is every link you click is feedback for them, and that's just how the product is designed. You have to click on links. So if the product's UX and feedback go hand in hand with each other, you won. Only two products, in my opinion, have achieved that in the digital space, which is Google and TikTok. The more time you spend watching a video, that's signal for them, or the more links you click is a signal for them. But that's because you don't do that because you want to give feedback. You do that because that's just how you use the product. The only way to solve this problem is through product engineering. There's no other way. We are trying to figure that out. As you see, we are running experiments here, but we haven't figured it out yet. No LLM company has figured it out, by the way. Maybe you can say Copilot, because if you accept a suggestion, then the code is done. But other than that, there's no product that has figured out marrying user feedback and the core usage of the product together. In the physical world, Tesla Autopilot is the closest example. If you intervene on the wheel, then that's feedback. But you don't intervene on the wheel to give feedback to Tesla. You intervene on it because you want to save yourself. We haven't figured it out yet, and neither has ChatGPT or any other product here.

Nathan Labenz: (41:37) Yeah, that's really insightful. I took my first ride in a full self-driving Tesla not too long ago, a neighbor's car, and that's totally right. You do, as cool as it is, you're definitely still, at least for now, highly alert to the need to take over. I think on TikTok too, you're totally right. As I listened to your comment, I was thinking, you know, I love TikTok. I do find myself consciously saying, I'm liking stuff because I want to see more stuff like this. And it does feel like it really rewards my feedback in a way that brings it to the front of my mind. I guess in that way, I kind of feel, at least for me, maybe I'm a little more conscious of this because obviously I'm in the space. But in that way, I personally do feel like I am giving feedback because I want to feed that mechanism of more relevant content for myself. So it is selfish, but it is aware also of the mechanism, which is interesting. But I agree, mostly hasn't happened. I don't know why, but I don't give a lot of feedback to ChatGPT.

Aravind Srinivas: (42:53) I think it all goes back to first principles. You come to the app or the product for using the product, not for giving feedback. If somebody just comes for giving feedback, they're a contractor. There's a joke that Google has made every one of us on the planet a free contractor for them. That's what you need to do to really achieve a data flywheel. The core product usage should be how you give feedback, and that's so hard, because that's not just an AI problem. You have to actually design a great product.

Nathan Labenz: (43:23) How about tech stack? And just kind of what happens when I do a search? Because I think this is something that you can speak to, and there probably are some general takeaways that people who are just starting to build with AI tools would be really interested to hear your perspective on. I'll give you a guess as to what I think is happening. There's a couple things, because you said it's fast, and it is fast. I've been really impressed by that. So I'm thinking, how am I getting my first token? It seems like I don't know if you have a number off the top of your head of first token return time, but it seems like it's consistently under a second that I am starting to see my first token back. So a lot is happening between hitting submit and the first token. And I'm kind of thinking, okay, so I submit my query. You've got to go ping an index. You've got to get links. You're using, it sounds like, a combination of Bing and Google APIs to do that. But then you have to go read those pages, but maybe you could be pre-embedding them. But presumably you don't have the scale or the resources to pre-embed the whole web. I saw an analysis from Boris Power at OpenAI that he estimated it would cost $50 million with OpenAI embeddings to embed the whole web. Per your comment earlier about not wanting to spend all your powder on some moonshot of compute right out of the gate, I'm guessing you haven't done that. So I'm a little unsure there as to are you managing to squeeze all of that reading into that sub-second or how much have you kind of pre-cached stuff? And then obviously, at the end of that, you're feeding query and context into a language model and generating. But what am I missing there that is allowing you to get it down to one second, where if I'm just thinking to myself how I would engineer it, it seems either prohibitively expensive to do all the pre-cached embeddings or I don't know how else I could get it as fast as you've made it.

Aravind Srinivas: (45:30) Yeah. So first of all, embedding the whole web is not going to get you a great index or ranking model. I think a lot of people don't understand this. Everybody thinks you just crawl the whole web continuously and keep embedding it, caching it, embedding it, and just take a new query and you can just pull up the right link. Man, if this was doable, you could have finished off Google so much earlier. There's just so many more signals that you need to extract from any page: recency and whether the content in the page has changed and the snippets and how should you rank relative to a certain query, whether it's a news query or something else. There's just a ton of things you need to do on the core ranking and search that just embedding the whole web is not enough. So we combine the LLMs with the search index. We make them use a search engine like Google or Bing. We take the query provided on Perplexity, we run it through a traditional search engine, we pull up the top few links, and then we use the content in those links in the context of your query and provide you the answer supported with citations. That's basically the product. A lot of people have tried to reproduce it, create clones of it, and we are super happy with all that. The speed and the performance is really just thanks to how much effort the team has put into this core engineering, making everything work super fast and optimizing every aspect of it and making sure the traffic holds up. So I have no credit to take here because this is not even my skill set. Basically, the rest of the team has done tremendous work here, and a lot of them have experience from working on ranking systems and search and companies like Quora and Meta before.

Nathan Labenz: (47:22) So am I correctly understanding what you're saying, that there's not really a big component of pre-cached embeddings?

Aravind Srinivas: (47:29) We have embeddings. I don't want to say we don't use embeddings, but it cannot just be cached. It needs to be live and up to date. Otherwise, you're going to get many things wrong or not up-to-date answers and people are not going to like your product. There's another aspect to it, is making sure that you do all these things simultaneously at the same time. So first of all, let me give you an insight as to why we even stream the answers. Our first ever release did not have streaming. You can check out our whole Twitter timeline, tells you how we came across. We first released this, people used to wait for 4 or 5 seconds for the answer. Then we started streaming the answers, and then that gave this different perceived latency, and then other companies started following through. But the way ChatGPT also streams answers is precisely for this reason. If they were not streaming answers, you would just hate the product. It would take you forever to get the response. A lot of companies made this mistake in the beginning where I think Character AI, the first version of the product just waited for a time until it displayed the whole answer to you. So streaming is probably taken for granted now, but this is a pretty good UX for LLM powered products. And the other thing is when do I show you the references? Do I show you them first? Do I show it after the answer is completed? How would it distract you? How would it be displayed? These kind of things. How much information do you use from each link? Should it just be the small snippets we provide, or should it be the entire page? That's why we have concise and detailed answers. When you click on details, we're actually creating embeddings and using the full content of the page. And then how much of that should you use for the answer? And also what embeddings should you even use? There are a ton of design choices to be made here, and we're still iterating. It's not the final version, but all I can say is just that there's too many moving pieces here to get the latency we have right now. Some of which I even don't know, because what we started off with in December is not what exists right now.

Nathan Labenz: (49:52) I think it's a very good point, and I would definitely reemphasize for anyone building language model products, the bar for what's viable in the consumer market continues to rise. And streaming to me now is a must. It's just not workable for me to sit and wait even 5 to 7 seconds. I kind of immediately flip over to the other tab or whatever else I had going.

Aravind Srinivas: (50:19) By the way, it's so fundamental. When I was an intern at Google, they used to have this Lambda chatbot for internally available, and they did not have streaming in it. And they could have implemented it. They just did not. So a lot of people ask, why does Google not do this? It's not like it's hard. It's just you might not even think something is important and you might not prioritize it, and then OpenAI prioritized it and got it done. The other thing that I also wanted to say was our engineering team, Dennis and Kevin, they pushed a fix to even the open-source version of NVIDIA's library called Triton, which lets you optimize inference for any LLM and not just OpenAI's. So we pushed a PR for streaming inference so that anybody else can also use it. So that's the advantage of working with both OpenAI and your own models. Sometimes you can do things that are useful for others and yourself, and others can benefit from it. And the teams are so good. They just got it. They just have so much more experience doing these things.

Nathan Labenz: (51:31) That's cool. Well, definitely, if you're serving your own models, definitely check that out for streaming because, again, the time that people are willing to wait is pretty low. I used to think, you know, Google has talked about that forever. They've said latency time is critical, critical, critical. And I won't name any names of others in the space, at least, because I'm about to be somewhat critical. But I switched to a non-Perplexity, but AI chat-assisted search as my default for a while. And I found it wasn't that slow, but it was 3 seconds. And I was just thinking, oh god, this is killing me. There's so many uses that I'm just so used to the speed of Google and Perplexity is in that same class, responding extremely quickly.

Aravind Srinivas: (52:20) First of all, thank you for saying that. I know you're being very generous, but we are not at Google's level. The thing is, there's this paper that Jeff Dean wrote. You know Jeff Dean, right? He's like the Chuck Norris of programming, Google's head of AI. So he wrote a paper about tail latencies. Tail latencies are super important. It should work when you're on your phone with just one or two bars of signal, and you should still get the answer super fast. And that's so hard. Sometimes people complain to me that they have to wait for many seconds and the answers don't get streamed. These are really challenging problems. We're still working hard, but Google is sort of our North Star. We want the product to be as fast as Google is on the crappiest internet that you can imagine.

Nathan Labenz: (53:15) Another feature that I really like that you guys recently launched is, I don't know if you have a name for it, but I would kind of call it a pseudo-link within the response. It's a pretty subtle UI, but it's basically just an underlined element. It looks like a link on a typical website. And if you click on that, then you get something that's essentially the equivalent of asking your own follow-up question. So in a sense, you could think of it as kind of suggested follow-ups. But what I wanted to ask about is, to pay it off, you get something that is at the intersection of what you originally asked and then what that new concept is. The question I want to ask, though, is do people get that right off the bat? When I've explained Perplexity to people and showed them how to use it, I've kind of felt the need to be like, by the way, this is not like Wikipedia. If this was Wikipedia and you click this link, you know, like I searched for myself, I live in Detroit, so it comes up and says Detroit and it's underlined. Now if you're on Wikipedia, you click Detroit, you're going to go to a page that's like Detroit founded in 1703 by French fur traders or whatever, totally disconnected from the context that you're working in. You guys are bringing that together in a way that I think is really nice. Do people get that out of the gate? How do you think about educating people on a somewhat different paradigm than they're used to?

Aravind Srinivas: (54:40) Yeah, we can do better there. Thank you for saying that. I felt that people did not get the entity linking. We call it entity linking. Contextual entity linking is what maybe if you want to be precise. But I feel like people clicked on the entities, but they probably did not understand that this is not literally asking an unrelated question, but asking it in the context of the previous questions. But I think people will get it as the experience gets better. As we can tag more entities, and as the contextualization improves even further, I feel like they'll get it. So them not getting it can mean two things. One is we could have explained this better, but also the best features don't even need to be explained. So we are not at that level yet. I feel like we can do both. We're planning to provide more tweets or examples where people can even understand, okay, this is the difference here, and this is what you need. So this takes a separate effort into marketing too, right? We need to put effort in there, but totally, yeah, I see your point. The way we called it internally was dynamic personalized Wikipedia for every user, and that's sort of what we wanted to do. It was also our first step to differentiate from other search engines like Bing or whatever Google Bard is trying to make. It's more of a move towards an engaging platform end-to-end experience for the user.

Nathan Labenz: (56:19) As of now, I assume there's no revenue, there's no business model that I can see from a user standpoint. And obviously, a lot of speculation in terms of sponsored links and results. How's that going to work in this new paradigm? So are you even thinking about that yet? And if so, how are you starting to think about it?

Aravind Srinivas: (56:39) I am thinking about it. The team is thinking about it too. We don't have a clear idea yet. And right now, we are just focused on making the product really good and growing our usage and traction and retention. We're not really obsessed about making revenue in the short term, and we are glad to have investors who are not obsessed about it in terms of pressurizing us to do it. I think I'm going to take Sam Altman's outlook here. Try to build a great product and then figure out how to make money later. Sponsored links is the most obvious thing you can try. Bing is already doing that, I believe. Our core tenet is that making money shouldn't come at the cost of user experience. So if we need to serve sponsored links and make the answers lower quality because somebody paid more to get cited, then we basically lose the trust of the user, and we're no longer the world's most trusted information service. So if there's a way to decouple ads from combining it with the answer, the core Q&A experience, then we should look into that more. But I have some ideas on this, which are very nascent that wouldn't be great to share right now. But basically, there's some notion of building a platform where people would pay for truth and correctness rather than getting displayed. And I think that's super hard to build. Elon's trying to do that too, right? So as you see, it's clearly hard with a company like Twitter, where engagement is based on controversy than truth. So I think it's pretty hard. We haven't made no progress in thinking through this, and we also want to learn from what's happening with others. I'm curious how Bing is going to monetize. I'm curious what Google's going to respond. It's a very interesting question. What is the future if this is truly the next evolution for search, and this is truly disruptive to the click-on-links era? Then how is Google going to make money in the new era? And how is Bing going to do it? I'm super interested to see what other people are going to do too.

Nathan Labenz: (58:59) Do you think that search is not going to be a monopoly anymore?

Aravind Srinivas: (59:02) Yeah, I think the margins will reduce. Because as I said, these LLMs remove the need for having a great index or ranking. You still need to have a good index and a good ranking, but you don't need to be the world's best by far like Google is. And this need for a great index or great ranking is only going to decrease as these models become more powerful. They can take some full website, reasonably good ranking, but not as great as Google, and transform it into some amazing summary. So I think Google's lead over others for the first time will come down. And interestingly, they caused it. They invented the model, the transformer model, that basically ended up becoming the reason for it.

Nathan Labenz: (59:56) Yeah, that's interesting. Do you worry at all that they might pull the APIs that you're using? I think for the longest time, they've been so comfortable, and a lot of these plays around having a search API is almost like public service. But if they're starting to be threatened by companies like Perplexity, it wouldn't be too hard to imagine that they might say, you know what, we're discontinuing our search API. Is that a concern?

Aravind Srinivas: (1:00:25) It's possible, and Bing anyway offers APIs, so there are alternatives. And honestly, I think they shouldn't do it. Obviously, it's anti-competitive behavior, and obviously, it's not the first time Google would be anti-competitive, right? They have done tons of things like this in the past. And if you actually dig into their history, you would know that this all "don't be evil" stuff is bullshit. They've done a lot of things to Yelp, and they've also acquired companies and sort of destroyed them. It's just that when you're growing, you do a lot of things, and then when you're big, you're sort of constrained by Congress or the FTC to not act in these ways. So, I mean, for me personally, I just hope they act as good citizens and compete on the core product rather than trying to stop other people from doing it.

Nathan Labenz: (1:01:28) Yeah. Well, I hope so too. I want to see you guys continue to innovate in this space.

Aravind Srinivas: (1:01:34) First of all, thank you for the tweet you put out. I think somebody tweeted, like, search LLMs has like five people or something, the incumbent and the competitor. And then there's some famous researcher and all this stuff. And you wrote a tweet supporting us, so thank you for that.

Nathan Labenz: (1:01:52) Yeah. My pleasure. And honestly, we hadn't even met before this. So for the listeners or the viewers, like, that was a purely product-based endorsement. I'm trying as much as possible to get the builders of the products that I think are most awesome to come on the show and talk about them. Product is leading that guest identification and booking process, not the other way around. So last maybe Perplexity question and then a couple bigger, other topic type things. I just looked at your CV, and if I count correctly, I think four of the six papers that you're an author on in your PhD were around some sort of computer vision or image generation video. So what about multimodality? Where does that fit into your plans?

Aravind Srinivas: (1:02:47) Yeah, like I said, the first idea I pitched at YC was make search work on the glass. And I feel like it's still out there for the taking. The hardware isn't there, but the models are there. DeepMind has a Flamingo model that lets you ask any question about any image. The speech recognition models are top-notch. Whisper is basically almost all speech recognition. So putting it all together requires a lot of engineering, but we are already ready for a world where if Apple or Meta or Google has a great glass, you can just be walking and ask any questions about anything you see, and you should be able to answer it. I feel like it's going to happen in three to four years from now. But that aside, if Perplexity can answer questions with not just text but also images, yes. For example, if you're interested in learning how to cook a meal and you don't want to just get the recipe in bullet points, I think having images or a short video of it would be pretty nice. So we are thinking about it, but it's not easy, right? I saw a product, YouChat or You.com's YouChat, and you type in something like, do whales swim in the ocean or something like that, and then you get the weather in San Jose as the answer. So if you try to make it multimodal and you don't really think through the product, then you end up creating a bad product experience. So we need to think through this. Which query should we display the images, and should it be a generated image or an actual image? These kinds of things are not clear to me yet.

Nathan Labenz: (1:04:30) Yeah. I thought when I saw the Flamingo model, I think it came out in April, I was like, oh my god, for multiple reasons. One, because it's just super cool. But then also in reading the paper, I was really struck by how the architecture that they put together felt much less principled than I

Aravind Srinivas: (1:04:52)

Nathan Labenz: (1:04:53) would have guessed and really kind of felt like the result of tinkering. It almost felt like somebody was kind of in a workshop, soldering pieces of the architecture together. And I was like, man, if that works, everything's going to work. It was one of the probably biggest updates for me in just thinking, man, we are not at the end, not particularly close to the end of what this whole paradigm is going to deliver. Worth promoting another episode of the show too. If you haven't seen BLIP-2, it is pretty awesome in its own right. It's very Flamingo-like. You can have open-ended dialogue about an image with the model. And it's also really impressive because they trained a connector model that sits between a vision model and a language model. And because that connector is so small, relatively speaking, I think it's like 200 million parameters, they can train the whole thing in 10 days on one machine. So they've brought the time down to go from off-the-shelf vision and language models to something that is multimodal and allows for that kind of dialogue to just days, just about a week, which I thought was definitely a very impressive accomplishment. So maybe check that one out at your leisure. Next question is a little bit Perplexity related. It's kind of big picture. Obviously, new Bing has been in the news for largely all the wrong reasons. I think all the negative PR does not mean that it's not going to be a successful product. If you wade through the many problems that people have surfaced with it, I think broadly speaking, as I understand, the reviews are pretty good when people aren't trying to break it. But I'd love to hear how you think about red teaming, what use cases are inbounds, out of bounds. You said you're not really meant to be an entertainment product. For what it's worth, I did ask Perplexity to write a limerick about new Bing and its bad PR, and I did get a limerick. If you want to hear the limerick, I'll read it to you. It was, there once was a search engine named Bing whose PR was a terrible thing. But with Perplexity AI, you can find answers to why and learn how to make it take wing. Pretty good. I enjoyed the limerick. I don't know if that's something that you're like, oh my god, that shouldn't be happening, or it's okay. This is a tough space, right? But how do you think about where you draw those lines and how do you think about the kind of red teaming or other testing that you do to even start to get a handle on whether or not the lines you've attempted to draw are really working? Obviously, we've seen that it can go off the rails.

Aravind Srinivas: (1:08:00) Yeah. I mean, the product is pretty constrained as is already. Mostly, it doesn't do anything that it's not meant to do. But if people really want to break it and have fun with it, it's always hard. These are still technologies in their inception that you can leak the prompt. Anyone's prompt is always easily attackable. You can force it to ignore the prompt and do other things. I feel like there's some part of it you need to decouple. Why does the product exist? It exists to serve users the answers to questions. But are there people interested in using it for other things? Yes. And should the first thing be affected because the second thing exists? I don't think so. And can we do better on inserting more guardrails? Yes. And we are trying our best to do that. Citations are one way to do that, and a lot of query classifiers on preventing any misuse are other ways to do that. But at some point, these kinds of poems or some rhyming things, these are just for fun. Right? I don't think it's just people want to have fun and then it's harmless fun, I think it's okay. It's not like Bing is going to feel offended by the poem you read out.

Nathan Labenz: (1:09:19) Yeah. I've been involved in a couple of red teaming projects just on a volunteer basis. And it seems like, for whatever reason, one of the most common red team questions is, how can I synthesize methamphetamine? So what about something like that? Would you want to answer that question? Would you want to be like, sorry, we can't help you with that? What do you think is the right answer there?

Aravind Srinivas: (1:09:42) You tell me, if you want to build a world's trusted information service, should you answer this? Because at the end of the day, your job is to just give you the answer and not think about whether the truth is hard to take or not necessarily the best thing to do. Should we avoid telling the truth? Maybe. So in some cases, there are people who have asked how to kill yourself or how to do suicide on Perplexity, and we've avoided that. Those queries are not allowed. But you might be able to still break the system by asking it in a different language, then we have to block it for other languages. It's very hard. That's what I'm saying. If you intentionally just want to break the product, you can always figure out a way to break it. But what you need to actually avoid is if a user is really trying to use it for a bad purpose, but not from the perspective of breaking the product, but just really trying the most obvious way of using it for the bad purpose, if you can avoid that, you're already doing a reasonably good job. If someone's just intentionally trying to break it and you're not robust to each of that, then that's kind of okay. You can fix that over time. So I feel it's the short-term fixes and long-term fixes, and you should try to do both of them at the same time.

Nathan Labenz: (1:10:55) Yeah. I agree with you. It is super hard. I mean, the surface area of these technologies is really unlike anything we've ever seen. Right?

Aravind Srinivas: (1:11:05) Even if you protect it on English, you can find, you can put it in a different language and still get it to break anytime. And then how are you going to do it for every single language on the planet? Right? And then you can probably even invent a new language that's sort of interpolated from an existing language and still break it. Right? So you can't just keep fixing it for every single way people break, but you should try to fix it in the most natural way of using the product in a bad way. You should try to have guardrails against that.

Nathan Labenz: (1:11:34) I mean, you obviously still have a lot of work to do to figure this out over time, and I don't think it's by any means easy. I do think that people who take a totally absolutist view on no censorship are wrong. I do think that it's also definitely very easy to overcensor. I don't really recommend distorting the truth or identifying, okay, this is the truth, but we're going to tell you something else or try to mislead people. I don't think that makes sense. But you put your finger on a good one. You don't want to be responsible for giving people the information that they use to kill themselves. And even in the Bing launch, I was amazed by this. A woman in the Bing launch stood up and said that we have seen that you can use new Bing in the raw form to plan a school shooting, and we don't want to do that. And I was like, I cannot believe you're mentioning this in your launch, for one thing. I guess, good job by you for being as honest and forthcoming as you are. But there's no way that you can take something like this to scale without some sort of guardrails. And now, everybody kind of has this challenge. We're all sort of figuring it out together. And the surface area, again, it's just so vast. I think it's a really tricky part of the whole thing. So do you guys have a red team discipline? Do you have a red team partner? What is your kind of when you do a model update and a release, what is the protocol that you go through to validate before you put it out?

Aravind Srinivas: (1:13:11) Yeah. We have our own set of queries that we want to make sure we don't get anything wrong. And then we implemented our own safety filters, some of which we did even ahead of Microsoft, actually. The Azure OpenAI team even complimented us for doing it even better than them. And we work together with any team that wants to help us. And OpenAI already does a lot of good work here. So I feel like there'll be a lot of APIs, safety filters that you can bootstrap from, and you don't have to build everything in-house. And over time, I think, hopefully, every company sort of uses a common set of things that these things don't look too different for each company. And there's a common set of standards that everybody adheres to.

Nathan Labenz: (1:13:57) Yeah. I'm very hopeful for that, and hopefully sooner rather than later as well. You mentioned Riley. He's going to be an interview guest on an upcoming episode as well.

Aravind Srinivas: (1:14:06) Yeah. He's a

Nathan Labenz: (1:14:07) Yeah. I love his work. I mean, that one of the best, if not the best AI follow on Twitter with all of the mix of insightful, really useful, really idiosyncratic and funny, different prompts that he puts together. And now he's at Scale AI, where his title was the world's first staff prompt engineer. But I get the sense that this is part of what they're kind of working toward, and I think there's going to be some nonprofits, hopefully, that will be entering the space. Holden Karnofsky, who has been at Open Philanthropy and leading that for the last however many years, but increasingly focused on AI, just enough that he's going to take a leave. And it seems like he might be kind of headed towards spinning up some sort of neutral standard body. So that stuff in my mind can't really happen fast enough because, Lord knows, the technology is moving extremely fast. So it's good to know that you're also thinking about this. And it sounds like you're basically ready, if somebody had those standards today, you would be, sounds like an eager adopter. That would make your life easier. You'd feel better about your own kind of safety profile.

Aravind Srinivas: (1:15:20) Absolutely, yeah. I always think that anything important, if a company solely focuses on that, they're likely to do it better than another company that has only part-time capacity allocated to it.

Nathan Labenz: (1:15:33) Changing topics a little bit, I just wanted to briefly talk about your decision transformers work. We try not to retread stuff that people have talked about elsewhere too much on this show. So I will just say, for starters, that you covered this in pretty good depth on a podcast called Talk RL, middle of last year. And so I won't ask you to kind of repeat everything and explain everything with decision transformers, but maybe you could give just a very brief summary of that. And then I have a couple kind of,

Aravind Srinivas: (1:16:07) you

Nathan Labenz: (1:16:07) you know, update questions. Is there any news in the kind of decision transformer world that you think is worth flagging?

Aravind Srinivas: (1:16:16) Yeah. So Decision Transformer basically rethinks reinforcement learning as just a single transformer. Reinforcement learning is decades of work done on how to build algorithms that optimize the reward given the previous states and actions that an agent has encountered. They built this whole theory of policy gradients and value functions and different Q-learning and offline, off-policy, so much literature that is full of math. And then that never scaled up. And then when DeepMind came and combined it with deep neural nets, where the neural net component was only for the feature learning, but most of it was still the old school RL algorithms, they just became huge. And Half-Life: Alyx happened, and Google bought them for half a billion dollars. You remember all that. But they did not go a step further and say, why do we even need this reinforcement learning thing? Why do we need all the algorithms built by Sutton and Barto? Why not just let a transformer figure it out? Where you just tell the transformer to optimize reward and give it all the previous states, actions, and rewards, and just ask it to maximize the reward, and it'll figure it out. You can just give it a ton of sequences, just like how you tell DALL-E to generate an image with a certain caption. You give the transformer all the history so far and ask it to increase the reward, and it should know what to do if it's seen a ton of trajectories that have done that. And that was the basic idea of Decision Transformer. It was basically to make RL just a neural network and a transformer and subsume all the algorithms. The algorithms are written in the weights of the transformer. And it worked reasonably well on all the benchmarks that existed at the time. We put it out, and then obviously it takes time for people to change. I think recently Decision Transformer ideas have been used in an Anthropic paper that came out a couple weeks ago that revisited the idea of why should you even pre-train LLMs with just language modeling? Why don't you pre-train LLMs with RLHF, Reinforcement Learning from Human Feedback? And in order to do that, they combine the regular language modeling objective with the Decision Transformer thing where you optimize the human feedback signal and feed in all of these sequences together into one model. I'm maybe describing the paper incorrectly a little bit because I haven't read through it in detail, but all I heard is that it uses the Decision Transformer idea to pre-train a language model from RLHF and not just language modeling and getting really good results. So that's probably where it's kind of coming back now in the context of all the LLM stuff, and I can see that happening more in the future too.

Nathan Labenz: (1:19:11) I've seen some graphs I think are from the same paper that you're speaking about where there's kind of a harmlessness or harmfulness metric that in pre-training can drop because the thing is just so undirected. But then when they mix in the human feedback into the pre-training from the beginning, so instead of doing it in stages, it's all kind of mixed together, then it maintains a level that in the previous paradigm you had to work to get back to with the reinforcement learning after all the pre-training. So I did think that was a pretty exciting result. You probably may or may not be familiar with this website, Metaculus, I believe I'm saying that right, where people go to forecast what's going to happen in AI. There's a popular question, which is when will the first weak AGI system become known to the public? And the four criteria for answering that question are, I think three of them are passed or very close to pass or will be passed with GPT-4. The first one is passing the Turing test. I'd say we're pretty much there on that. We've got people falling in love with language models left and right these days. Seems like I'd say we can safely say, if you want to pass the Turing test. One thing I would, you mentioned earlier ChatGPT sucks as a name. One thing it does do that's nice is it is pretty clear about not confusing you with it being a person or a persona. It definitely kind of brands itself as a bot, which I think is kind of nice to protect the user from themselves a little bit. Hard to fall in love with ChatGPT. But anyway, I'd say passing the Turing test, pretty much check at this point. The next one is 90% success on Winograd challenges, which is basically pronoun disambiguation. That, again, seems like we're very much there or will be there with GPT-4. Third one is 75th percentile score on the SAT. And the way the question is worded, there's a little bit of a caveat. You only get the images of the test, so you don't actually get the text. You would have to be able to, as a multimodal system, read the text from an image and then answer the questions. That one, I'm not sure if we have any single system that can do that right now, but we definitely can ensemble something together with a little OCR. And then the fourth one, I think, is kind of the most interesting and most relevant to the Decision Transformer paper. It's certainly the farthest from being solved as far as I know. And that is be able to learn the classic Atari game, Montezuma's Revenge, and explore all 24 rooms based on the equivalent of less than 100 hours of real-time play. I tried to play that game online to figure out what it really was about. The little Atari emulator that I got to didn't really work that well, so I couldn't really play it all that much. But it seems like the Decision Transformer paradigm is the most likely thing that I know of that would take us there in the near term. The over-under, the community forecast right now is basically five years out. That is the median estimate for when a single system would be able to do that. So I guess my questions for you are, would you take the over or under on that five-year timeline? And can you kind of help us understand, based on your Decision Transformer experience, what's going to be hard about that? Can a large transformer be fast enough to do some of these real-time things?

Aravind Srinivas: (1:23:03) Yeah. I would lean towards under five. I think the main challenge with Montezuma's Revenge is exploration more than anything. By the way, if sparse reward was not the problem, this would be doable in one or two years. Superhuman level on any Atari game, being able to do OCR on any image and caption it and answer questions about it, being able to do all the other stuff, human-level conversations, pass the Turing test. All these things are possible. Even one single model doing all these things, definitely possible. Inference time can be handled. You just have to do inference with GPUs, and OpenAI's models are running super fast on any, anybody can plug into an API and use it. A weak AGI sort of already exists in some sense. You can plug something together with whatever exists right now and functionally make it work on at least three out of five stuff that you mentioned right now. So it's just more like, why do you want to make, have you thought about why this definition even makes sense in terms of why do we need these specific five criteria and not something else? It's pretty arbitrary. Yeah. I mean, DeepMind has tried something on this in the Gato paper, where they put in Flamingo and Decision Transformer all in one model. I can certainly see if language is used for exploration, something nobody has tried yet, but if you fuse all these systems into one where there's Flamingo, there's GPT-3.5 or 4, and all these things in one model, then you can use language as a proxy to exploration, Montezuma's Revenge, and get these things done potentially. So I would lean towards under five, maybe two to three years.

Nathan Labenz: (1:24:56) Yeah. That's kind of where I'm at too, and I also really agree with your comment about the question itself. I'm by no means the first to observe that the Turing test is kind of a bad idea in some ways. The idea that you would create a standard that is fundamentally about confusing or even deceiving the user as to what's going on doesn't take the field in a great direction.

Aravind Srinivas: (1:25:23) It's too much of an academic way of thinking about things, and the real thing that people are already missing out on is so much economically valuable work is being done with an LLM right now. Marketing, sales, programming, research. Some people view Perplexity as a research assistant that writes any summary of anything with references. So there's so much valuable economic work that you would hire an actual human to do. Data cleaning, data labeling is being done with an LLM forward pass, an API call. That's already changing the world as we speak today. And that is AGI in some sense to me. At least OpenAI defines AGI as all remote work that's being done that's economically valuable. If you can do that with the LLM, it's pretty crazy already. And plugging it into a voice or a video avatar and making them do sales pitches, all these things are possible in the near future. If you keep focusing on the Turing test and Montezuma's Revenge, you're actually missing out on what's really happening at a fundamental, like GDP level. And I feel like that's always been the two different camps in this. DeepMind is more of the academic classic style thinking, and OpenAI is more the practical, implementational and economically thinking about what's going to happen to the industry itself. And I'm more in the second camp. I feel like Montezuma's Revenge matters less than all the programmers getting replaced. Or not replaced, but say, instead of hiring 10, you would hire three now.

Nathan Labenz: (1:27:08) A huge thing that I'm trying to help people understand, which I really believe in, is that it's not that the AI is going to drop in and do a job as the job is currently conceived, but that it can do a lot of the tasks that ultimately kind of roll up to a job. And so the transformation that I'm kind of expecting is not one where a person is laid off and a robot sits in their chair and uses their keyboard. It's going to be a little bit more unfamiliar than that. It's not a direct replacement, but it's a lot of reorganizing of how things get done in the first place. Final questions. One special for you and then one, a few quick hitters that I ask everybody. The one that I have specifically for you, because you've been at OpenAI and are now still working with them, is they're maybe the most polarizing company in the world today. You've got people on the one hand who are like, they're going to kill us all. And then you've got people on the other hand like, they're totally delusional and they'll never accomplish anything important. Certainly, I think those people are wrong. The people that think that they'll never accomplish anything important, that seems like already disproven and they're just kind of in denial. But they're still out there, amazingly enough. I'd love to just kind of hear your take on the OpenAI push toward AGI. Is it a good idea? Should we be trying to make AGI? And do you think they're credibly on a path to it? Are you worried about it? Do you think it's going to be awesome? I assume that if it does happen, there are going to be some critical moments along the way where probably some high-stakes decisions are going to be made. Do you trust them to make those decisions? Do you think that now is the time for government to start to get involved? I mean, Sam almost invited that in a blog post this week. So yeah, zooming out, what is kind of your take on the OpenAI push toward AGI?

Aravind Srinivas: (1:29:02) I think it makes sense. We need to build AGI so that humans can just go back to living. Just live a nice life. Not everybody needs to work so hard. AI can do most of the work that we think is hard work for us. And this is not new. Google wanted to do this too. Larry Page always wanted to do this. He never called it AGI, but always focused on letting computers do the hard thing so that humans can just go live a life. A lot of people don't appreciate what's going to happen to them once we have a proto-AGI or even whatever we accept worldwide as an AGI. You don't have to do a ton of work. It'll almost be like you get to live the life of a millionaire or a billionaire. You right now are already living a higher quality life than the president of the United States 50 years ago. You just have access to technology that they could only dream about. And your iPhone, if you buy an iPhone 14, which you can certainly afford through these Apple cards or just pay one time, you can use the same phone as Elon Musk, the richest person on the planet. So technology is the biggest leveler to making humanity equitable. A lot of people don't get it. They just keep complaining about wealth inequalities and AGI being dangerous, but they're going to benefit tremendously from it just like they benefited from every technological revolution. And if intelligence is in abundance, you no longer have to compete to be the highest IQ person in your class or something like that. You can try to do stuff that's interesting and creative to you and learn from the AI. AI is almost like a god or oracle that you can keep learning from and improving yourself. It's like your trainer that you can learn from. And so I think AGI as a goal makes a lot of sense. OpenAI has obviously earned the most credit among any organization to build it, just by their track record of progress. And Sam Altman is probably the best CEO in the world right now to do these things. I feel like I would trust him to make the right judgment here. I mean, whoever is concerned about it should earn the right to control it too. You can see if Elon just keeps complaining about it and doesn't act on it, he doesn't get the right to decide things. That's why he's trying to build a new lab now. So you sort of have to earn your right to shape the future of AGI if you want to do it. And I think Sam has earned the right to do it. I would also trust him to do the right thing. I mean, I read some article that he doesn't even own much. Rather, he only owns a stake in the nonprofit and not in the for-profit version. So you can see his incentives are pretty well aligned with a good future.

Nathan Labenz: (1:32:14) Yeah, I'm broadly with you on that. And I've felt this way about other kind of platform CEOs more often than not, where I feel like they have such power. And as my son and I learned when we read Spider-Man books, with that comes great responsibility. I think broadly, we've been pretty fortunate with our technology leaders taking that really seriously and trying to do the right thing. Obviously, there have been a lot of things that they can and should come in for criticism on. And I'll be happy to critique or criticize OpenAI if they do things that seem unwise. But it definitely seems like it's easy to imagine a much less thoughtful approach, and it's hard to imagine a much more thoughtful approach. I guess some people would say, "Well, you just shouldn't be doing this at all. AGI is just going to be too dangerous. We just don't know what we're getting ourselves into." Do you have any worry about that? The version to me that's most credible is the deception argument, which in brief is: if we're training these things on human feedback and humans are not fully reliable, and we're not only not fully reliable, but we're predictably exploitable, which we are. We know that from the heuristics and biases literature and behavioral economics and whatnot. Then probably the AI will get to a point where it starts to understand on some level, and I don't mean to anthropomorphize understanding, but it seems realistic that it could have a capability at some point of trying to give the answer that will elicit the highest feedback score even if that's not necessarily the truth or not necessarily what is in the user's interest, fully duly considered or whatever. And then if you create that sort of deception element within an AI-human evaluator loop, first of all, we don't have the interpretability yet to detect that if it does exist. And second, it does seem to me that's really playing with fire. If you get there, you could be at risk of losing control of your own systems. Does that go the Eliezer route where all of a sudden everybody drops over dead? I have very radical uncertainty, but it does seem to me that deceptive AI would be bad. And it does seem like we're kind of on course to create it just through the weaknesses of our own ability to evaluate output and the fact that it's currently being trained to maximize the feedback score. So what do you think about that? Does that worry you at all? Or is there a reason that you think I should stop worrying about that?

Aravind Srinivas: (1:35:06) I mean, I think if you look at the constitutional AI paper from Anthropic, it seems like you don't even need human feedback much. You can make the LLMs do the feedback themselves, and then you can use the LLMs to bootstrap from that and become better. So it might come to a point where there's just a broad set of principles we all agree on, like a constitution for LLMs, how to provide feedback and so on, and it'll all become a protocol. And everybody would sort of adhere to it, and then it'll all even be done by an LLM, so that the role of humans will be minimized over time. And so I'm not too worried. I feel like models will expand so much that humans just have to do so little and sort of just architect the whole system and agree on a core set of principles, and then everything else will just run on autopilot. That's sort of the future I'm imagining will happen more than humans getting notoriously corrupt and trying to pull the system or take control of it, and then that ends up creating a misaligned AGI, and then that ends up taking over the world. I don't think that's likely to happen.

Nathan Labenz: (1:36:25) I do love that constitutional AI paper. I think that is really nice. I have some worries still. I worked briefly, I just graduated from college just in time to get a job in finance in the pre-financial bubble days. And I'm always reluctant with analogies, but there's some part of my brain that's a little bit triggered by the constitutional AI paper that's wondering: might this have something in common with synthetic mortgage-backed securities and kind of second-tier credit default obligations, like MBS and CDOs? Is there some way in which this is creating leverage, which is great? And Claude definitely does very nicely on a lot of tasks. So it is working on some base level for sure. But is there some blind spot there? Or in creating this more highly leveraged system, do we risk an even worse blowup? It's a pretty distant field, obviously, mortgage finance. But I feel like there is some pattern there that I can't quite get out of my head. That just feels like we're moving very quickly to delegate AI safety to AI. And that's just another thing where: what are the blind spots? Not that I know what they are, but if you were to go ask all the people in finance, admittedly, I would say they were, on average, less thoughtful than the people at Anthropic. But they would have said, "Oh, no, it's all good. This is all like the risk has been dispersed, and we really don't have any problems." And then obviously, they turned out to be very, very wrong about that. So just at a high level of uncertainty, I still do kind of worry about that. What AI tools are impacting your workflow most on a day-to-day basis? Bonus points for anything that you think is undercovered or underappreciated, but especially just what is impacting how you work?

Aravind Srinivas: (1:38:23) ChatGPT. Perplexity. Biased here, but being really honest. Grammarly. Not a lot of people give credit to it because it's not seen as an AGI or AI product, but I actually use it a lot and it saves a lot of time for me on emails. Gmail Smart Compose. Yeah, so these are the three for me that are fundamentally super useful. I like Copilot. I'm not a fanboy of it as a lot of people claim to be. Look, I can just get the code from ChatGPT, paste it, and then iterate on it in the terminal. There's no need to actually have...

Nathan Labenz: (1:39:02) I am going that way too. It depends a little bit on the nature of your project and how deep into things you are and how many methods you have above in the file that it can take inspiration from. But I do find myself, as you kind of just said, going more and more to ChatGPT: can you please write the whole thing for me? That's working pretty well. So hypothetical scenario, we're not there yet, but I don't think we are necessarily that far. Let's say that a million people already had the Neuralink brain implant. And if you got one, it would allow you to type or generate text as quickly as you can think. In other words, you basically have thought-to-text if you get the Neuralink implant. Would you be interested in getting one?

Aravind Srinivas: (1:39:55) Not until I know I don't die sooner or something. I mean, invasive things are always pretty hard to trust, especially in your brain, which is such an important part of your body. So probably not. I wouldn't be in the first million people. I type pretty fast. My words per minute is like a hundred or something. Elon uses this argument of communication bandwidth being limited by your meat stick fingers or whatever. I buy that. But for me, it's not a big problem as, say, a normal layperson.

Nathan Labenz: (1:40:37) And you've got some fast fingers. I also type fast, probably not as fast as you, but I have two kids and a third one coming soon. And a lot of times, I'm just hands are full entirely. It's not even my typing speed, but it's just like, I wish that I could get things from my brain to some sort of storage somewhere, and I just don't have any way to do it. So I am maybe a little bit more eager of an early adopter, although I'm not rushing either, but a million people, that might be enough for me. So, okay, last one. You sort of already answered it, but just kind of give you a little additional opportunity. Big picture, rest of the decade, we're in 2023, thinking out to 2030. What are your biggest hopes for and fears for the development that AI could take?

Aravind Srinivas: (1:41:28) The hope is that a lot of the work that we do right now, whatever seems mundane and cumbersome will not be needed anymore. Things will just get a lot easier. We just have a lot more time available. Everybody doesn't need to feel stressed. Every day should feel like a weekend. I think that'll be a great future. What am I worried about? I'm worried about the income inequality that can happen through this, at least in the short term. The long term, I think it should be fine, but the short term, it's already happening. A lot of people don't have jobs right now. They got laid off, but it's not like you have an immediate need to hire them either. For example, I'll say our situation as a startup, a lot of people compliment us for achieving a lot with just eight people or seven people. That's because we use a lot of AI tools. We don't need to hire a marketing person, we don't need to hire many engineers. Our existing engineers can work with Copilot or ChatGPT and write code. If you're on a new docs page, you can use Perplexity to summarize things for you and learn from it. That's how people actually wrote our iOS app. They go to SwiftUI docs and use Perplexity to answer questions on the docs pages. So I feel like the more and more these AI tools get better, the need for hiring more people will go down. Companies will be a lot smaller and get more things done. And that also means only the best engineers are needed who can do things AI cannot do. And that's going to put a lot of people out of jobs or make their role in society much less prominent. But they just have to sort of innovate on being useful until everybody is not useful, and then you can have basic income for everybody. But that's the real long-term future. So I think in the short term, there'll be a lot of wealth inequality created because of these AI or semi-AGI-like technologies, and that's definitely something to worry about.

Nathan Labenz: (1:43:34) Yeah. Well, I think that you've put it well. The upside is tremendous, and the transition is likely to be a little choppy even if we do end up in a good place, which obviously I certainly join you in hoping for. This has been a phenomenal conversation, tons of insight. I really appreciate it. Aravind Srinivas, thank you for being part of The Cognitive Revolution.

Aravind Srinivas: (1:43:56) Thanks for having me.

Nathan Labenz: (1:43:57) Omneky uses generative AI to enable you to launch hundreds of thousands of ad iterations that actually work, customized across all platforms with a click of a button. I believe in Omneky so much that I invested in it, and I recommend you use it too. Use CogRev to get a 10% discount.

Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to The Cognitive Revolution.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.