Amazing Answers: Richard Socher on how You.com is Reimagining Search with AI

Watch Episode Here

Video Description

In this episode, Nathan sits down with Richard Socher, CEO and Founder of You.com, a personalized AI search assistant. They discuss the rise of the AI chatbot paradigm and how that's changed the game for search, You.com's various modes, with particular emphasis on Genius mode and above all Research mode, and much more. Try the Brave search API for free for up to 2000 queries per month at https://brave.com/api

LINKS:
- You.com: https://you.com/
- Richard Socher's site: https://www.socher.org/

SPONSORS:
The Brave search API can be used to assemble a data set to train your AI models and help with retrieval augmentation at the time of inference. All while remaining affordable with developer first pricing, integrating the Brave search API into your workflow translates to more ethical data sourcing and more human representative data sets. Try the Brave search API for free for up to 2000 queries per month at https://brave.com/api

Omneky is an omnichannel creative generation platform that lets you launch hundreds of thousands of ad iterations that actually work customized across all platforms, with a click of a button. Omneky combines generative AI and real-time advertising data. Mention "Cog Rev" for 10% off www.omneky.com

NetSuite has 25 years of providing financial software for all your business needs. More than 36,000 businesses have already upgraded to NetSuite by Oracle, gaining visibility and control over their financials, inventory, HR, eCommerce, and more. If you're looking for an ERP platform ✅ head to NetSuite: http://netsuite.com/cognitive and download your own customized KPI checklist.

X/SOCIAL:
@labenz
@RichardSocher (Richard)
@CogRev_Podcast

TIMESTAMPS:
(00:00) Preview
(01:38) Interview with Richard Socher: AI pioneer
(02:26) Exploring the You.com product
(02:55) The AI Bundle
(05:23) Richard Socher's journey in deep learning
(16:43) You.com comparison to competitors
(31:16) The future of AI search engines
(41:38) The changing landscape of search engines
(41:44) The impact of chatbots and AI on search
(42:16) How the market will shape up
(43:33) The power of open source
(01:02:32) The role of AI in advancing science
(51:42) The future of AI: philosophical and practical considerations
(55:06) The future of AI: risks and regulations
(01:11:07) The future of AI: agency and emergence
(01:26:57) The future of AI: retrieval, memory, and online learning

Full Transcript

Transcript

Richard Socher: (0:00)

Early 2022, right? We had apps that would write code for you within the search results. We had apps that would write essays for you within the search results. But whenever we innovated and changed the default Google experience too much, we had the vast majority of our users say, I'm so used to Google, I don't want another way of finding answers. And so we kept getting pulled back to this need. And so the most amazing surprise was when ChatGPT came out, all of a sudden people got it. And it was like, wait a minute, it could just be pure text. And we're like, we've been trying to slowly get there, but we had to make a bigger jump. The way I think about the different modes is the default smart mode is kind of like you had an assistant and you just asked them to do a quick search and in 2 or 3 minutes give you an answer back. And then genius mode, you want to ask your assistant for a question that they have to be able to program, they have to search the web, and then they need to be mathematically able to answer that question. I mean, as a kid, I also enjoyed watching Terminator. It's a cool action movie, but it's just taken over so much of the AI narrative, and it's actually actively hurting, especially the European Union.

Nathan Labenz: (1:12)

Hello, and welcome to the Cognitive Revolution, where we interview visionary researchers, entrepreneurs, and builders working on the frontier of artificial intelligence. Each week, we'll explore their revolutionary ideas, and together, we'll build a picture of how AI technology will transform work, life, and society in the coming years. I'm Nathan Labenz, joined by my co-host, Erik Torenberg. Hello, and welcome back to the Cognitive Revolution. Today, I am thrilled to welcome Richard Socher, a pioneer of deep learning for natural language processing, formerly chief scientist at Salesforce, and today, founder and CEO of you.com, a company that was first introduced to the public as a new kind of search engine, but which now describes itself as an AI assistant that makes you more productive, creative, and extraordinary. Richard has deep history in deep learning. He was among the very first to recognize the potential of neural networks in the natural language processing domain, and his work has helped shape the field as we know it over the last decade. In this conversation, Richard takes us on a brief journey through his own intellectual history and reflects on how the field of AI has evolved in both expected and surprising ways. Before we dive deep into the you.com product itself, covering the historical challenge that they faced when trying to compete with Google and how the rise of the AI chatbot paradigm has broadened the space of possibility for search and discovery products. We also look at you.com's various modes with particular emphasis on the genius mode and above all for me, the research mode, which delivers amazingly helpful and thorough report style answers even on some remarkably complex topics. We also briefly discuss the future of AI business models as well, including the obvious subscription and my pet theory about the AI bundle. Along the way, we touch on a number of important topics too. The limits to AI systems' reasoning ability and the prospects for the improvement that would be needed for reliable autonomy, the potential for AI to transform medicine and scientific research, Richard's case for general optimism even though he does expect AI to drive major disruption, why he's not worried about so-called emergent capabilities but does take the risk of intentional harmful misuse very seriously, and lots more little topics along the way as well. Richard is a leading thinker in the AI space, and his perspective is essential for anyone who wants to understand where this technology is going and what it means for the future of humanity. And in all seriousness, I really do recommend you.com. It has absolutely joined the ranks of the AI tools that I use multiple times each week. And particularly when I want a comprehensive, multi-page report style answer, I find that you.com's research mode is often the single best tool available today. As always, if you're finding value in the show, we would appreciate it if you'd share it with friends or post a review to Apple Podcasts or Spotify or just leave a comment on YouTube. Now without further ado, I hope you enjoyed this conversation with Richard Socher of you.com.

Nathan Labenz: (4:18)

Well, let's do it. I think this is going to be a lot of fun. I'm looking forward to your point of view on a bunch of very interesting topics. Richard Socher, founder and CEO of you.com, welcome to the Cognitive Revolution. Thanks for having me. I am very excited to have you. You are at the intersection of so many interesting things. I sometimes have been describing myself recently as the Forrest Gump of AI because I've just kind of very unstrategically made my way through the last few years and yet found myself in some very interesting places. I don't know how you think about your own trajectory, but you are kind of an OG in the realm of deep learning and have founded this very interesting company and have a really awesome product, which we'll get into in more detail. And I'm interested to hear about all that and your philosophy and expectations for the future. So we've got a lot of ground to cover. Maybe for starters, you want to give us a quick history of your own role in the history of deep learning and how you've come to the present? I don't usually even ask these biographical questions because these days, it's a lot of the same answers. People are like, oh, when I saw GPT-3, I thought this is going to be a big deal, and I got involved. But you were there at the beginning. So maybe you want to just give us a quick history of your own role in the history of deep learning and how you've come to the present?

Richard Socher: (5:29)

I started with AI actually in 2003 when I started studying linguistic computer science or natural language processing back in Germany, Leipzig. And at the time I was like, this is really interesting. I love languages. I love math. I love computers. Computers are where languages and math can meet in some useful functional ways, I thought. And it was very much a small niche subject within computer science. And I was really excited. At the time, there wasn't quite enough math for me in NLP, and I felt like we're just getting stuck in some of the linguistic special cases, and I loved the formal semantics and set theory and the algebraic foundations. So I moved eventually into computer vision during my master's. And there I also, in Saarbrücken, at the Max Planck Institute there in the university, found statistical learning and pattern recognition, and I fell in love with that. I was like, clearly, if you can really understand patterns, any kind of pattern really well, you could solve all these different kinds of problems. And so I ended up doing my PhD at Stanford. In the beginning of Stanford, when I started trying to really contribute to the field rather than just learning about it, I basically found that even the top NLP people, they write their papers mostly about these beautiful models, like conditional random fields, latent Dirichlet allocation types of models. But then most of the coding happens when they actually do feature engineering. Right? They say, oh, well, I want to do entity recognition. I add a feature of this is a capitalized word and this is an all caps word, or this is a word that is one of the items in this list, and this list includes city names we already know. And I'm like, man, this field is very hand engineered. It's very graduate student descent to get better. And then at the time, I was very fortunate because Andrew Ng got into deep learning on the computer vision side. He's like, well, images are pixels, and it's a fixed number of pixels, so we can feed them into a neural net or at the time, variants, probabilistic models, restricted Boltzmann machines. And I was like, wow, maybe we can use ideas from that for natural language processing too. And there's maybe one or two relevant papers from Bengio and Jason Weston and a few others, but no one really in natural language processing paid any attention to it. But I thought, clearly, that has to be the future. I want to give the data, and I want to get an output. And so in 2010, I started publishing my first neural net paper. I worked on computer vision before and saw some of the power of ImageNet as well, and really started running with it. Got a lot of rejections all throughout, but at some point, I had sunk my teeth into it, and I just loved it. I thought this is the future. Despite all the rejections, I kept going at it. And then after the PhD was over, there was starting to be more interest in deep learning and neural nets for NLP, but still no one in the world was teaching that as the official right way of doing NLP. So I started teaching at Stanford, first as a visiting lecturer and then adjunct professor. I was very fortunate, had lots of very smart students back then, including the Hugging Face founders I invested in in their first round. And then I also wanted to bring these neural nets into the world. Started MetaMind, my first startup to do that, to build a general purpose platform for neural nets very easily, both computer vision and NLP, got acquired by Salesforce, became chief scientist there and EVP eventually. And in Salesforce, I had my probably last and biggest rejection with inventing prompt engineering in 2018. And we were so excited about it because it was the culmination, personally for me, of this decade-long dream I had of building a single neural net for all of NLP. And the idea was, at the time, every AI model was built for one task. You want to do sentiment analysis, I build a sentiment analysis model. You want to do translation, I build a translation model. They're all different. We're like, what if we could just build a single model and you just ask it a question? What is the sentiment? What is the summary of the sentence? Who is the president in this paragraph? And that was, for us, I thought, the most exciting thing we possibly could be doing. I just did this TED Talk about it, came out last week. But it was rejected. But it did inspire a couple of other folks, and when OpenAI was publishing the papers about GPT-2 and 3, they cited that paper saying, look, they were able, they showed you can have a single model for all of NLP if you just ask them these questions, and that's now prompting and the rest is kind of more well-known history.

Nathan Labenz: (10:16)

That is an amazing history, and it definitely, I don't know how modest you want to be versus taking credit for foresight, but certainly, the idea that there could be one model to solve all these tasks was not obvious to people. And, boy, we still see this. The flaws in the peer review process are still on prominent display these days. Most recently, I noticed this with the Mamba paper, which I was a very interested reader of and then went over to the OpenReview site and was blown away by how negative some of the reviews were. A confident reject was given. So that was just a good reminder that, yeah, this is still an unsolved problem. What would you say has surprised you most from the big picture since you had that notion of this generalist NLP model? Fast forward now we have GPT-4 and possibly Q-star or something like that in the works. It hasn't been that many years. Right? But since you kind of had that notion of this generalist NLP model, fast forward now we have GPT-4 and possibly Q-star or something like that in the works. Is this the trajectory that you thought we'd be on, or how has it deviated from what you imagined back then?

Richard Socher: (11:26)

It's very much aligned with what I hoped the field could get to. And now it's almost obvious, right? No one questions this anymore. We've had all these breakthroughs. And I think the biggest surprise was maybe more on the application side of things in that for us, we've been playing around with large language models at you.com and infusing them into search results earlier, early 2022, all right? We had apps that would write code for you within the search results, apps that write essays for you within the search results. But whenever we innovated and changed the default Google experience too much, we had the vast majority of our users say, I'm so used to Google, I don't want another way of finding answers. And so we kept getting pulled back to this need and there was kind of annoying. And so the most amazing surprise was when ChatGPT came out, all of a sudden people got it. And it was like, wait a minute, it could just be pure text. And we're like, we've been trying to slowly get there, but we had to make a bigger jump. And that was incredible. That unlocked a lot of people realizing waiting for 10 blue links isn't the best way to get an answer. An actual answer is the best way to get an answer, and that's in text.

Nathan Labenz: (12:43)

So let me give you a couple of my experiences on you.com recently, and then you can tell me where you are in the overall story. And then I really want to kind of unpack the product as it exists today and the road map and everything you're working on as a way to kind of explore a bunch of different aspects of where all this is going. And I think that's really the mission of this show is to kind of help people see around the corner and starting with me, helping me develop my own worldview. But I've been really impressed with the product recently. Listeners will know that I've been a big fan of Perplexity. We've had Arvind on the show a couple of times, and I think they do a great job and remain a fan. But I have found distinctive value in at least two modes on you.com recently. One is the research mode, and the other is the genius mode. Those to me have stood out as the most differentiated. For research mode, I recently took a 200 word question that was all about mixture of experts architectures and, is there curriculum learning happening here? How do people think about the trade-offs between how many experts should we have and how big should they be and how many should we activate at any given time? Are there any scaling laws or whatever designed for that sort of thing? Just every basically, every question I could think of about mixture of experts, I took it all in one go. And it was really impressive to see it kind of break that down and go through multiple steps of searching and analysis and really implementing kind of what is at this point a 6 month classic agent setup, but applying it to that research question and just going down the line, really quite valuable results. It definitely is something that I will come back to and have already found myself kind of being like, I think this is a good one for you.com research mode. Genius mode is a little bit different and more kind of analytical. I'd be interested to hear a little bit more about how you think about the differences, because I then tried one that was a big Fermi calculation exercise where my questions were like, what are the different data sets that exist in today's world? How big are they? How do they compare to each other? How do they compare to the training data size for GPT-4? How do they compare to available compute? Because I have a big question, which is kind of one of the ones I want to get to toward the end around, to what degree is ML research poised to start to be kind of semi-automated? And so I'm trying to wrap my arms around that with these Fermi calculations. So genius mode was really the best way to approach that. And, anyway, I would definitely encourage people to bring multi-part complicated questions to both research mode and genius mode, and I think you'll be impressed with the results. And I would say that even with the expectation that folks who listen to this show have tried other leading AI products. So that's kind of my unpaid endorsement, very sincere, and I'd love to hear a little bit more about how you think about those different modes, how they work, and just kind of big picture, where we are in the you.com product journey long term. Hey. We'll continue our interview in a moment after a word from our sponsors.

Richard Socher: (16:03)

Yeah. These are great questions. I think it shows you kind of how sophisticated the space has gotten in the last year alone. Around this time last year, we were the only search engine with a web-connected LM and millions of users. And now that idea has been copied so many times, including as mentioned by Perplexity. So I think you have to differentiate in the different modes, and I think the modes kind of show how sophisticated the space has gotten and how hard it is to still differentiate on better technology versus just design and go-to-market and marketing and things like that. And so we actually did a comparison to Perplexity with 500 real user queries, and we asked which answer do you prefer? And it came out to be that in 50% of the cases, users prefer the you.com answer, and they prefer the Perplexity answer 30%, and they don't see a difference in 20% of the answers. That's for our default, we call it the smart mode. That's kind of the default. And just to give you a sense of what that looks like. So here's an example of what the default smart mode looks like. There's some doping case that happened, and you can see lots of careful citations. And then when you actually look into these citations, they actually are articles from literally yesterday, or they could be from today if something came out today. So that's kind of the default smart mode. You get a quick factual answer. But then we thought, well, what if you have a pretty complex question around math, physics, chemistry, science, or complex numbers. So here is a genius mode question that kind of gives you a sense of what it does, and it does mention what you say, which is there's an orchestrating LM that orchestrates multiple other LMs to actually do the right thing. Right. So the question here is find the current population of Beijing in the United States, then plot the population from 2003 to 2010, not 2100, and then assuming a 2% growth rate. And then it will go on the internet, it'll find the numbers, and then realize, well, I've got to now visualize those numbers now that I have them. So it'll code up in Python what this could look like, execute the code, and then gives you this answer and visualizes it in a nice plot. And so I'm still sometimes amazed. I try and I push it and sometimes it fails and sometimes it fails because it tries to load a library that has a security issue and then it's like, okay, I'm going to try to rewrite it without this library, but it's going to be longer and messier code. And it's just incredible how hard it can try and what it can do. And then the third mode, like you said, the research mode, it will go into a lot of detail. It will not just look up all the stuff we have in our index already, like news and things like that, it will go on the web and find the most relevant websites. It'll do multiple different searches on the web, combine all of that, and then give you these beautiful research reports. This one is seeing a background action, consequences of the Peloponnesian War. Now, it's history, you have to write an essay or something. And it's just like, writes you this perfect, beautiful essay. Each sentence has one or two citations from different sources, and you can verify all of them. And one thing we found this actually also is you have to, just the citation logic is a nontrivial aspect of building this all out, because we actually found that some of our competitors just randomly add numbers and citations to sentences, and you click on it and it doesn't even mention that fact anymore, which I think actually really undermines the space of chatbots for search. So citation accuracy is one of the many sub AI systems that you need to do correctly here. And then there are just crazy things like create a table of some noise cancelling headphones that are not expensive. And it just puts a table together, pulls some images, gives some pros and cons of each and the price. And it comes sometimes to me, it's how well this general system is able to answer these questions. And, yeah, it shows you how complex the space has gotten and how much you have to do now to still differentiate on the technology.

Nathan Labenz: (20:10)

This is one of my mantras at Waymark. I always say the waterline is rising quickly, so we better keep climbing the capabilities ladder ourselves. The 4 examples that we saw there, one was the kind of default smart mode. The second was Genius. Is that right? The one that showed the code example, and then the last two were research. Yeah. What more can you tell us about kind of how those work? And by the way, the audience of the Cognitive Revolution is interested in the details, the weeds, the nuggets, all that stuff. So you can go as deep as you're willing to share. I'm interested in all aspects. Prompting, I'm sure, obviously, is going to be different. Scaffolding is going to be different. Maybe even the models are different. I'm also really interested in when are you using GPT-4? I know you've got your own in-house trained ones as well. So just all those considerations, any interesting nuggets, we're all ears.

Richard Socher: (20:59)

Yeah. I'm going to try to balance a little bit not telling the competition exactly how it's all done, but being interesting to your viewers and your listeners. So at a high level, there are two major stacks. There's a search stack and a chat stack. The search stack, we actually had to build an entire index ourselves for the web because Bing is super expensive, not as high quality, Google is very hard to access, and you have to have special agreements or some people kind of steal or bootleg or leverage some sort of APIs to use Google results in a somewhat sketchy legal gray area, which we don't want to do. And so we basically ended up having to build our own index, and that's hard. And there's still a lot of complexities behind that. But the main difference of this new index is that it was built with LMs in mind. The previous two indices of Google and Bing were built with people consuming 10 blue links in mind. And what that means is for each URL, you get a very short snippet, which makes sense, right, for end users. But an LM could read hundreds of snippets. They can be very long, and then extract the right answer from that, and then just give you that right answer as the user. And so what was surprising is actually when we benchmarked this, our API ended up being more accurate than Google or Bing. And you go to api.you.com and it's surprising to a lot of people that you could actually be more accurate than Google or Bing at all, but it is because we are at an inflection point in AI, and it is a different way to value things. We are almost cheating by having these really long snippets. And so if you look at the comparison, it's actually kind of interesting and a lot of people have asked, how do you compare accuracy in LMs? How can you evaluate this? And so just to give you a sense, here is what this looks like. The first version is just reasons to smile. Now you can use whatever LM you want, but you can see injected into your prompt these very, very long snippets from many different URLs in a very short amount of time. And then we also have one that just does everything. It gives you an LM answer and it tells you all of these things. And so how do you evaluate this? There was actually an interesting insight from our team, which was you can take question answering datasets such as HotpotQA, SQuAD, the question answering dataset, MS, Microsoft Marco, FreshQA, and so on. And these datasets are structured such that you have a paragraph, you have a question, and then you have a substring phrase from that paragraph that is the right answer to that question. And so what we do is we basically take those datasets, but we throw away all the paragraphs, and then you have to find the right answer and the paragraphs have to come from the Internet. And so you replace the paragraph with a web search engine. And that's how we evaluate it. We hit the big Google and Bing public APIs and have outperformed them. So kind of nerdy, but that's the whole search tech stack, and we make that now available to every other LM. So that's the first. And then the second thing is what we now have started calling the LMOS, the operating system of large language models. And it's a term inspired by Andrej Karpathy, and it's not the most perfect metaphor, but I think it captures a lot of the essence, which is you have now this new stack that operates at a much higher level of abstraction and the LM is kind of a CPU, but just like a CPU or kernel of an operating system, it's important to orchestrate everything and to do computation, but it still needs a hard drive, which is RAG, right, on your own vector database, a Chrome app. You have an ethernet connection, which is the internet, that's what we're providing. You may orchestrate other LMs that could be considered the GPU or something, and then you have a bunch of apps that are sitting on top of that. You have a Python code interpreter, which we've seen in our Genius Mode and all of that. And so to summarize all of that in one short term, we call it the LMOS. And inside that, we're now seeing a lot of our customers are using our APIs and search site. They're kind of going through the same lessons that we had gone through when we built you.com and made it have the most accurate answers out there. And it's actually highly nontrivial. A lot of people say, well, it's just an LM wrapper. Right? But then, and you even have open source projects that show it. And then you ask, okay, when was Obama born? Where was he born? And then it fails. Why does it fail? Because when you send where was he born to your search backend, it's not going to return you any useful results because it doesn't know who he is, who does he refer to, right? And there's tons of things like that where as you have a longer and longer conversation, especially in smart mode, you refer back to past statements. You can say, what's a big CRM company? And then the answer inside is Salesforce and you ask, oh, what's their stock price? Now if you send what's their stock price to your search backend again, it's not going to return anything useful. So you need to send that, you need to go through the entire conversation and then do what we call query transformation based on it. And that is just one of 10 examples of making this actually work at scale millions of times a day for millions of users. It is a lot more complicated to make it accurate. There are about 10 other such models that if you think deeply about the space and you really listen and look at user data and listen to where it's breaking, you will eventually get to. And we're now thinking about offering more and more of that.

Nathan Labenz: (26:51)

So I'm tempted to ask for the other 9 things there.

Richard Socher: (26:55)

I'll just give you one more, which is whether to do a search at all or not. Right? If you ask, write me a poem about the beautiful Bay Area and a sunset love story or something, you don't need a citation at every line of that poem. And so it would actually clutter up the prompt to add a bunch of facts about poems and the history of Silicon Valley and all of that. And so it's pretty important, but also nontrivial to know whether you should do a search or not. And, again, some websites just slap search results on top of everything even if they're not relevant for having more conversation about your feelings or something.

Nathan Labenz: (27:35)

Did I understand correctly that the big difference is that the you.com index has more information? Instead of a short SERP, it is a more robust paragraph. And so independent of the language model that you're using, the richer context is just better kind of. So you're decoupling what information is found from the language model that is doing the analysis, and more information is kind of the big differentiating factor there. Do I have that right?

Richard Socher: (28:12)

I would be careful in saying we have overall more information. We're focused a little bit more on the main languages that we see. We don't support some very rare Indonesian, African, Central Asian dialects and so on yet, but we return more information per query because of these longer snippets. So yes, there is more information, but I think the long tail Google probably still has a larger index. If you look for this rare Indonesian kayaking site that rents out kayaks on this little lake somewhere, and it's all not in English, we might not have that website. But when it comes to Western world news where we have a lot of users, then Latin America and so on, then we shine and return much more information per query.

Nathan Labenz: (29:02)

Hey, we'll continue our interview in a moment after a word from our sponsors. I've been struck recently that it seems like search in general has been a monopoly for a long time. And as you noted, the user experience was something people were not necessarily looking to explore new things on. The nature of the index, of course, they've done millions of person hours of work at it, but it seems like it's been a pretty consistent paradigm of crawl around and find everything and suck it up. Now we're starting to see these interesting, I don't know if you can share more about how you create your index, but we just had a sponsor, Brave, talking about their index and the way that they are building it through users actually visiting websites and taking a sort of not just blindly crawling around and following every link, but what are people actually engaging with online, which struck me as a pretty interesting and very different twist on it. I want to pull this apart in a couple different ways, but is there anything that you would want to share about how you think about building an index that aside from just bigger, richer content, is there a different tactic as well that underlies that?

Richard Socher: (30:20)

The tactic is more about how we make that work for LLMs better, and I don't think there's that much differentiation on how we crawl. You have to have a bunch of data. It's been helpful to have run a search engine for several years and get user behavior and knowing what people actually want to have crawled and want information for. You can also surprisingly buy a lot of that data in bulk.

Nathan Labenz: (30:45)

How about, so I have a few questions on the business side or the bridge between technology and business. Google obviously has been free and has been ad supported. It seems like the new generation of AI first LLM enabled search is going more in the direction so far of a subscription. And as far as I've seen in my you.com usage, I haven't seen anything that jumped out to me as sponsored. Another dimension too is, I mean, Google has all these tabs at the top, but it's one bar. Right? You put in one thing. And with the newer ones, we also are seeing a little bit more proliferation of modes and settings that you choose upfront, right, with the smart versus genius versus research. So I guess on those two dimensions, what is the future vision? Do you think that this all gets unified? Do you think it ultimately comes back around to ad supported, or do you think that these current differences from the past will persist?

Richard Socher: (31:46)

Yeah, it's a good question. I think there is clearly right now not a great chat ad offering, and there's a good chance that that will change maybe this year to maybe the dissatisfaction of users. But the truth is, you want something to be free, VC money will only last so long. You've got to, at some point, those companies that offer free service have to. And if you don't want to pay for it, then it has to have ads. And so while I might not be the biggest fan of ads, you have to make a decision. Do you want to pay for it and then the ad's free, or do you want to support it with ads? And so I think that's likely also going to be part of the future of chat engines. And you already see a little bit of exploration. There's a little bit of a duopoly in search in the sense that Google had the monopoly on consumer search, and for a long time Microsoft had the monopoly on search API. But then because they're monopoly, they just signed up with just 5 to 20x their prices, and they could do it because they are the only ones in town. So I'm glad there's more competition now and more movement in that space, and all the little guys had to scramble when those prices just went up. So I couldn't really serve the unit consumer space with those prices anymore. And so I think ads will happen. We're seeing a lot of growth on the subscription side too. We're really loving the genius and research mode and the search mode, the default mode, smart mode also very helpful. And we actually incorporate links still. So where just last week, some people are complaining about other chatbots because they don't really have a lot of capabilities that you would assume from a search engine. When you actually use you.com here, you can, on the top right, see the standard links that you might want. And sometimes that's just helpful, and that's just what you want. And sometimes you just want to have the pure chat experience. And so that is important to get right. And then we have all these apps too where you can basically ask for like, what's the Microsoft stock price or something. And then, you know, it'll just give you a live ticker rather than a bunch of texts about the stock ticker. Right? And so we have all these apps because we have that deep search background, and that makes it an actual viable knowledge assistant. Right? Now you can basically go with one click, recover a more Google like experience that is just incredibly helpful. And that's, I think, one of the reasons why our browser, which we have also for iOS and Android, we had to build a browser to be a default because you can't go into Safari settings to use you.com as a default. So we built a whole browser for iOS. And we're super stoked because we're going to be one of the options in the EU to have a choice pop up screen when the new iOS 17.4 comes out in March, and they can select you.com to be their default browser, and it's the only default browser in that list that is chat, and all the other ones are your standard Chrome, Firefox kinds of browsers. And so I'm really excited, and I think that is going to be a big part of our future is making it so that more and more young people are able to just use this as a consult. And then if they want deeper, go into genius mode, research mode several times, at some point, you use subscriptions or eventually do it.

Nathan Labenz: (35:18)

Yeah, I've been, so I'll run a trial balloon by you on this concept that I've been kicking around called the AI bundle. And this is inspired a little bit. I don't know that anybody wants to say that they're inspired by the cable bundle, but I have been struck that there are a ton of great tools out there, and I want to use them. I want to try them. I think a lot of people are in that very exploratory curious mode. But to make the economics work on a freemium is kind of tough, right, and typically needs a certain minimum threshold in terms of what the paid tier can be. You actually have one of the lowest subscription prices at the $10 a month level, I think, of anything really that I'm aware of.

Richard Socher: (36:07)

We're gonna update it soon because I think the people that are willing to pay often don't care if it's 10 or 20. And so if you want to get GPT-4 literally the same underlying model as ChatGPT for half the price, gotta come in soon because we're gonna eventually switch our prices to be industry standard.

Nathan Labenz: (36:25)

But that maybe even just further reinforces the point that the freemium model is tough. Right? It's a lot of free usage. The upsells have to have a certain minimum. You're raising yours. And then from a, I don't know if this would apply to you, but a lot of the app developers that I've talked to have a lot of retention, let's say, challenges. You know, everybody's like, I'm getting traffic. I'm getting conversions, but retention is definitely a problem. This has been true at my company, Waymark. We're a much more narrow tool that specifically creates marketing and advertising videos for small businesses. So a lot of times people, they need that once in a while, and they're not necessarily ready to add on a subscription. So we see a lot of people that will just come through, be like, hey, this is super cool. I'll buy it. I'll immediately cancel it after I do what I need to do, and maybe I'll come back in the future. It's not even that I was dissatisfied. It's just that I kind of want this as more of an a la carte purchase than a subscription. So that stuff, you know, VCs don't like that. The metrics on the traditional scorecard don't look great. I've had this idea in mind that maybe what we need is sort of an AI bundle. You know? I'm prepared to spend $100 a month on various AI tools, but what I really want is access to 1,000 different tools that can split up my 100. However, I don't even know. As a consumer, I don't really care about that. As somebody who's trying to maybe engineer a bundle, obviously, the devil could be in the details there. But first of all, do those challenges, it sounds like at least the freemium challenges resonate. I wonder if the retention challenges resonate, and I wonder if there's any appeal to maybe being part of a bigger bundled purchase where you would be, you know, one tool that, it's funny, I keep referring to you, but then also the company is you. But where you.com could be one of a bunch of things that people could access and could share that revenue in a way that greases the skids for everybody. Right? My hope is that everybody can use the best tools and they don't have to make these highly binary decisions.

Richard Socher: (38:32)

Yeah, that sounds great. Sounds like a great idea.

Nathan Labenz: (38:34)

Okay. Well, I'm not doing it yet, so either I need to start doing it or somebody if anybody wants to organize the bundle, yeah, send me a DM. I guess another way that this stuff could get bundled would be into the mega platforms. You know, another possible vision of the future that I could imagine is Google probably retains market share leadership, but maybe the 10 biggest technology companies in the world say, hey, you know what we should do is also have a search. And we can get there. We see a path. You know, Microsoft's obviously already doing that. Meta, not really yet. Apple, not really yet to my knowledge. You know, Salesforce, not really yet. But maybe these guys say, hey, is there a musical chairs game that potentially develops where the younger AI search companies end up partnering off. You know, Amazon also would naturally be a suspect in this analysis. Does that seem like a possible vision of the future? I'm sure you've thought about this quite a bit, but why or why would that not happen?

Richard Socher: (39:36)

I do think the monopoly that Google was able to keep around is going to be harder to sustain long term. I do think it is much more likely going to look a little bit more like, I don't like the analogy for some reason, but like fast food, for instance. Right? It isn't just McDonald's. There's also Burger King, KFC, and Taco Bell. I think search will be a little bit more like that. I think, again, more fragmented in the future just because we hear people now like, this is better than Google. And we didn't raise that much money. And the first 2 years were sort of free chatty BT, or people didn't want us to innovate too much. They're very stuck with Google, but now there's a new young generation. That young generation has grown up with TikTok. We have a TikTok app and a standard search, grow up with Reddit. We have a Reddit app and a standard search. And each of these takes away a little bit of the Google search. Right? Amazon probably was the most successful in taking away searches from Google, where if you want to buy something below certain threshold, 50 or 100 bucks, you just search directly on Amazon because there you can execute on your intent of actually purchasing that thing. Right? And so why search it in Google and then search it again, try to find it on Amazon? You can just do that right away. And so I think TikTok has taken away for young folks some searches from Google that they're like, I want to see what the restaurant is, but they kind of want to see what the restaurant's ability to create good Instagram photos or TikTok videos are, and so they want to see the TikTok videos of other people before they decide on how good this looks. If there's a Venn diagram, we are overlapping with search, but we're also actually expanding search. Like, you wouldn't ask, give me this complex story about the Peloponnesian war or do this mortgage calculation with this and this interest rate and that increase and blah, blah, blah. Because you know Google wouldn't give you an answer. It's not going to be precise for you. It's not going to go on the web, summarize 20 different or 50 different websites for you and then create this nice essay. So chat expands search. You don't talk about your feelings that much to Google in search box until now. Right? You asked about this recent news event. You want to learn some quick facts. And then the more complex the facts get, the less and less you go to Google and more and more you just go directly to something like you.com. And so, yeah, I think the search landscape is clearly changing.

Nathan Labenz: (42:07)

Yeah, there's also just, it's really not a natural monopoly anymore, but there is still definitely a need for scale and economies of scale. So one way I've framed this too is how does the market shape up? Right? And one way to think about it that I find pretty compelling is maybe it ends up looking a lot like cloud because in the limit, it sort of is cloud. You know? It's like, what do you really need? You need the actual data centers. You need the compute. You need bandwidth. You need these raw inputs that the big companies have built out seem to be the things that are probably, as we see a ton of innovation at the application layer, those things are still pretty expensive and not easy to recreate. Yeah.

Richard Socher: (42:52)

I'm very excited. I'm up for it. You know, that's why we got into this space in the first place. Because we thought, we saw the transformer, we saw our highly, lots of co-attention mechanisms in that tech NLP paper that invented prompt engineering. We're like, clearly the technology is right to disrupt this industry. But Google is this amazing company that was able to create a monopoly for almost 2 decades that makes $500 million a day. So when you make that much money a day, you don't want disruption, you don't want that much change, right? And that's why all the transformer authors have left eventually. And what's really powerful is because of open source, you can actually innovate a lot more now from open source to an actual product that runs millions of times, isn't down ever, has good uptime guarantees and accuracy, no hallucinations, up to date news information. I mean, it's still complex, but clearly the bar has gotten lower. That would have cost us billions of dollars to build 5, 10 years ago, and research wasn't there yet. And I think it's ultimately amazing for users, right? Because if I had to distill all of you.com right now into just 2 words, it would be amazing answers. And you just get more of them, and that means people eventually are more productive. And the young generation that's growing up with ChatGPT, they're not going to go back.

Nathan Labenz: (44:23)

Okay, so feel free to punt on this one or just decline if you like, but it seems like I can envision a you.com by Salesforce very easily where the, as they try to be the everything app for all work almost, right, especially with Slack now, does it seem realistic to imagine a future in which all the big tech companies have this super robust suite, and you're either in the Microsoft suite with Teams and Bing or you're in the Google suite with G Suite and Bard or you're in maybe the Salesforce suite with Slack and you.com? You know, I'm not trying to be your banker here, but that seems like a pretty natural outcome to me. Interesting.

Richard Socher: (45:09)

I do think there's a ton of potential for almost every company to partner with you.com and supercharge their chatbot. So we're very excited to partner with a lot of folks.

Nathan Labenz: (45:22)

Okay, that's a very diplomatic answer. Keep your options open. All right. So we can touch on certainly more business and product stuff, but I wanted to now go into just the future of all this in practical and maybe increasingly philosophical terms as well. Running down a set of limitations of where AI is today, and I think, again, folks who listen to this show have at least a decent sense of that. So for starters, reasoning. You've obviously got the genius mode. It can do the most advanced reasoning. I assume that that is tapping into GPT-4. You know, everything I understand is basically nothing is really on the level of GPT-4 for general reasoning purposes.

Richard Socher: (46:02)

Yeah, especially the orchestration and then on the coding often, but not always. Yeah. So I'll tell you another. One, the third system is knowing which LLM to use and sometimes multiple. And the fourth system is dynamically prompting different models. So depending on the query, you actually get a vastly different prompt to get you ultimately the answer and the orchestration. So it's another complexity layer.

Nathan Labenz: (46:30)

So what do you think is the future of reasoning? If you have maxed out what the current capabilities are, where do the future capabilities come from? I'm thinking about things like to a degree, you sort of already have it with using different models is one way of implementing variable compute. Do we see these interesting projects like the thinking token, you know, think before you speak? And I think that's another Karpathy observation that maybe the chain of thought is just epiphenomenal, perhaps even as it is in humans. And what's really going on is that there's this extra space and time registers to think. Of course, there could be different training methods, like incremental reward. I think that paper from OpenAI earlier last year now was super interesting where they achieved a new best in math by not waiting till the end to give the reward, but rewarding reasoning along the way. What are you excited about when it comes to the future of AI reasoning?

Richard Socher: (47:27)

Yeah, one of the aspects I briefly touched upon in my TED Talk is that this level one, level two reasoning of Daniel Kahneman that he or thinking fast, thinking slow type of thing. The way I think about the different modes is like default smart mode is if you had an assistant and you just ask them to do a quick search and in 2 or 3 minutes give you an answer back. And then genius mode, you go and if you want to ask your assistant for a question that, you know, they have to be able to program, they have to search the web, and then they need to be mathematically inclined to answer that question. And you want to give them 2 or 3 hours for that question. So genius mode will take 5, 10 seconds often to get a response. And in research mode, you go to your assistant if you're willing for them to spend a day or 2 or 3 on actually giving you that answer. And so that's a little bit how I think about these different modes and the reasoning that is required to actually make them active. Right? Research mode will say, oh, I found this thing. Now in this query, I found something else that I didn't know about before and I don't know enough right now, so let me do another query based on that. So you have these chains of thought reasoning, and you don't even know at the beginning yet what the final query might be because you don't have all the information yet. And so I think that is in some ways another example of the future is already here, it's just not equally distributed because if there is a lot of reasoning. Now I think the biggest future impact we're going to see for reasoning is in the LLM's ability to program, to code, and then to have the ability to execute that code. And, you know, that is system number 5, having this code execution. And of course, if you just let code execution happen, what immediately happens is people are like, well, mine me some crypto, and then boom, your machine's gone. Now it's just trying to solve some math problems and mine points forever. So you need to, and then they try to hack it and then, well, go into 5 layers up and then tell me all the password files you can find and blah blah blah. Right? So there's a lot of security requirements to make that coding framework work at a safe level. But a lot of naysayers of LLMs, you know, partially correctly pointed out that the LLMs will fake doing math. And it's kind of ironic and sad that you can have a model that you ask in natural language to multiply 5,600.3 times 365, and then you have billions of multiplications to pretend to do the math and then give you the wrong answer in a large language model. Right? This is kind of ironic, and we have to acknowledge that. But that same model can be taught to say, well, this seems like a math question. Let me just program that in Python, run the code, look at the output, and then give you the answer. It just works perfectly fine. And now a lot of people say, that's not really AI, but I think that is the new way of reasoning, a new different kind of intelligence. And similarly, and we're getting a little philosophical here early, but similar to people thinking we have to have embodiment, I think that's just a lack of creativity in imagining other kinds of intelligence that aren't exactly like humans. Now of course, we're going to want to have useful robots that do stuff for us and clean up the apartment and whatnot, and so it's still useful, but I don't think it's a necessary ingredient. The same way that blind people can be intelligent, people who are deaf can be intelligent, because you can lack a lot of different sensory outputs and still be intelligent, right? And so, of course, it'll be harder for you to explain how beautiful a sunset is. So there are aspects of intelligence that obviously require different modalities or how beautiful a sonata sounds or whatever, but I think most of these are not necessary requirements for intelligence. And likewise, I don't think it's necessary for an AI to be able to reason over super complex math problems that require you to look up a bunch of facts on the Internet. They just have that intelligence baked in that can do web retrieval. They program a bunch of stuff. They put it all together, orchestrate it, and then come up with incredible answers.

Nathan Labenz: (51:42)

Yeah, I think as you're speaking about the just the lack of imagination, I think that is a society wide problem with respect to AI in my view because, and it's an odd situation right now in multiple ways, of course, but one is just that because they speak our language, you know, it feels easy, feels familiar, and it's all too easy to sort of assume that under the hood, they're more like us than I certainly understand them to be. And I think this is actually one of Eliezer's great contributions, obviously, you know, a polarizing figure these days. But, thankfully, it does not seem to me that we are in a high likelihood of a FOOM scenario, you know, of the sort that he has historically worried about the most. But I still would say some of his writing on mind space, the space of possible minds, and some of his concrete imaginings of alien minds that are shaped by very different evolutionary environments and just very different from ours, but still unmistakably intelligent in just super weird ways are actually still very good prep work, I think, to just sort of expand one's own mind about how different intelligences can be and how, you know, something does not have to be human like to be meaningfully intelligent. You know? It's not this binary, can it do things that a human can do in a way that a human can do it? If not, it doesn't count. I think that is a huge mistake that people are way too quick to jump to. And I'm not sure if it's a coping strategy or just lack of imagination or what, but I think that the emphasis on the broader space of possible minds and the different kinds of intelligences that are starting to pop up is super important.

Richard Socher: (53:35)

100%. Yeah. And you have to differentiate between sci-fi authors who then pretend to be AI safety researchers. Like, I love sci-fi. Actually, I'm super stoked that 3 Body Problem is on the list. I mostly read nonfiction, but when I read fiction, I did enjoy the Body Problem a lot, decided for that series to come out. I hope they do it justice. But I think there are a lot of different kinds of intelligence, and I love sci-fi for inspiring people to think about interesting new futures. Now, of course, especially in the Western sort of canon, most sci-fi is dystopic and people are scared for all the things that can happen that are wrong. It's like, I mean, as a kid, I also enjoyed watching Terminator. It's a cool action movie, but it's just taken over so much of the AI narrative and it's actually actively hurting, especially the European Union where there's sort of in the spectrum, the US is more of a litigation society, and the Europe is more of a legislation society structure. And they both come from reasonable legal scholars' minds. Well, let's just wait until there's a problem, someone sues, now you have the case law for that lawsuit. But the legislation one tries to prevent harm from ever happening before it actually harms anyone, which makes sense. Now, and of course, the US does that with FDA and the medical space now also, but not in the legal space as much. And so what that means is you can move quicker. But long story short, some of these sci-fi scenarios have gotten so much weight in legislation that I think slowing Europe down by trying to outlaw models or over-regulate models that are above a certain number of parameters. GPT-2 was very well hyped up in the past. This is so dangerous. Maybe we can't release it. You know, yes, we're OpenAI, but this can't be open. It's so dangerous. Models much more powerful than GPT-2 are out, and I haven't seen the apocalypse happen. I haven't seen a huge change in mis- and disinformation on the web because of LLMs. It's just a lot of fear mongering, both in the immediate level, which actually has real threat vectors and concerns with AI, but especially in the long term level of AI and self-consciousness. It turns out no one works in conscious AI. No one works on AI that sets its own goals and even more fundamentally its own objective functions because that doesn't make anyone any money. Imagine a company spends billions and billions of dollars, builds this super intelligent system that's conscious, understands itself and set its own goals. And now you're like, okay, now that you can do it, help us make more money. It's like, no, I'd rather just go watch the sunset, maybe explore that. No one pays for AI that sets its own goals because it doesn't help anyone achieve their goals. Because of that, there's not even that much exciting research along those lines. And because there's not much research progress, it's very hard to predict when that will actually happen.

Richard Socher: (56:49)

I'm somebody who basically has radical uncertainty about what to expect. And broadly, I'm pretty libertarian, pretty anti preemptive regulation. I would like to see more self-driving cars on the road sooner, and they don't have to be an order of magnitude safer in my mind to be worth deploying. So I'm broadly the sort of person who would be very skeptical of early regulation or getting too bent out of shape about things that haven't happened yet.

At the same time, something about this has always felt a little bit different to me, and I do think the people who take the most zoomed out view and sort of say, hey, and this is what I understand Hinton's position to be at this point: Why do we dominate the earth as it stands today? It's basically because we have better ideas than the other living things, and we can build tools and make plans and reason in ways that they can't. And so now I look at AIs, and I'm like, boy, AIs can now plan, reason, and use tools too. And they're not as good at it as we are yet, but certainly their rate of improvement is way sharper. So possibly it levels off and settles into a zone where it's on par with us or just the best tool we've ever had, but maybe it doesn't. I don't know why I should be confident that it won't. I don't throw P(doom) around a lot, but I have, again, radical uncertainty. When people ask, I'm like, I don't know, 5 to 95 percent? I haven't heard anything that makes me think in the next hundred years that there's a less than 5 percent chance that AI becomes the dominant form, an organizing force in the world. And also, no reason to think it's definitely gonna happen. But is there a reason that you would say you are confident that this will not happen and we don't need to worry about it, or it's just like it's still far enough away that you think we'll have time to start to worry about it if we need to? How would you summarize your position with respect to these tail risks?

Richard Socher: (59:00)

I think P(doom) is already an interesting mathematical issue, which is it looks and sounds like prior probability, but really, it should be a posterior probability given data. And right now, none of that data suggests doom, like existential risk where humanity is like cats and dogs at the whims of some AI. Nothing in AI research leads me to believe that AI, while potentially being more intelligent than any single human, I think it's already—this is actually this new term I'm thinking about maybe trying to coin, which is superhuman abilities and then superhumanity abilities. AI is already superhuman in translating 100 languages. AI is already superhuman in predicting the next amino acid in a large protein synthesis chain because we have evolved. That's an incredibly powerful tool. One of the other really exciting papers that we published in 2018 at Salesforce Research that multiple companies have now used and are running with, I think you'll see it achieve all of medicine. AI is already better at predicting the weather than any human. So you already have many superhuman skills.

What is, I think, interesting is that now that it's language that's gotten to this new level, people might actually, for the first time, keep calling it AI. In the past, when AI researchers have made progress in AI, people stopped calling it AI after it was achieved. Now it's just your chess app. It's just Siri voice recognition. But voice recognition, chess playing, that was the pinnacle of AI research. And people thought, once we solve those, the other things will be easier too. And it never was quite the case. And once we have them, now it's not quite AI anymore. Now with language, I think we might keep calling it AI, but what a language model does is predict the next token. And that is an incredibly powerful idea, right? Just predicting the next token now means if you have enough capacity and you have enough text, predicting next token, you learn about geography just because at some point, somewhere in your training data, you have to predict the next word in the phrase, "I was in New York City and driving north to." And now to give a higher probability to Yale, Boston, Montreal, than to Paris, Miami and San Francisco, you have to know that those are north of that city. And so it just learns all of this incredible world knowledge, but there's nothing in there that makes it say, well, if I really wanted to reduce perplexity—perplexity is basically the inverse of the probability. The model wants to not be perplexed in predicting the next word correctly. And so that is a powerful idea, but nothing in that will let an LM eventually realize that, you know what the best way to reduce complexity is if every sequence ever uttered and any sequence that will ever be uttered is just the letter A. Now if the model was trained on just sequences of letters A and no human was ever around anymore and all sequences were just producing the letter A, now you'd have perfect predictive probability on the next letter. And so maybe the best way for the LM is to wipe out all of humanity and then just produce letters A and be happily perfect at predicting with probability 1 correctly. It's so absurd. It's so absurd to think that LMs will at some point emerge to think that many steps around their task of predicting the next token. It's just not going to happen. So I think P(doom) is still 0.

And then when I actually tried to engage with some folks, and I had some other conversation last year with Nick Bostrom—it was in English, but published in a German newspaper—and I read up on some of these scenarios and I engage with folks who are worried about P(doom). It's just all fantastical sci-fi scenarios. It's like, oh, it's going to develop this magical gray goo or a magical new virus that is perfect in distributing, but then only will activate after one year to kill everyone. All these random scenarios that are just not feasible and the science isn't there yet. I'm actually right now sort of on the side of the fun writing a book about AI for science. I think it will do incredible things for us in improving science, foundational physics, chemistry, biology, and so on. And all this fearmongering, I think, is not really helpful. And again, there's no research that suggests AI is becoming conscious. There's a couple of papers here and there, people playing around with ideas, but nothing interesting has been published, no breakthroughs have happened whatsoever in AI having any sense of self.

And then in a lot of the other sci-fi scenarios, people are saying, oh, AI is so intelligent, it'll convince everyone to murder each other or to kill themselves and so on. But if the most intelligent entities were to always rule, I don't think we would have the politicians always everywhere in the world that we see, right? It's not always just the most intelligent people that run the show, and they can use their incredible intelligence to convince any other person who is less intelligent to do exactly what they want. It's just not based in reality.

So I am very optimistic about AI. I do think there's some real problems right now. AI will pick up biases. Not all the biases that you pick up on the web is something that most of humanity is proud of anymore. There's racism, there's sexism, there are various kinds of biases. Some people want to use AI. So where I agree with Yoshua Bengio and others is of the three threat vectors, which is intentional misuse, accidental misuse, and loss of control. Obviously, intentional misuse is real, and so that's not ideal. And so yes, those are real concerns. I think open research will help us understanding those threat factors and finding best ways to compete with them. I think people still on the internet need to understand not to trust everything they see on the Internet, which has been true ever since the Internet came about. Hasn't really changed that much with AI. I think since Photoshop, people should already not trust any photo they see. They should be even more worried now about photos they see. And sadly, in the future, they'll have to start worrying about videos and voice, just like they should have worried about photos ever since Photoshop started to really work.

And so there are a lot of concerns, and I don't want to diminish them. And I do think we need to work on them. I think different cultures will have different answers. Freedom of speech is defined differently in different countries. It's illegal in Germany to deny the Holocaust. We learned from our history there. That's not illegal in the US. And so different countries and different cultures and societies will answer some of the problems that AI kind of amplifies already in the past, will answer these questions differently. But I don't see any probability for a full-on P(doom) scenario of existential risk to people. It's mostly people using more and more powerful tools against other people.

Nathan Labenz: (1:06:12)

So there's so many different threads there that I am interested in. For one thing, I applaud you for taking time to envision positive futures. I think one of the scarcest resources today, oddly, is a positive vision for the future. Like, what do we want this—it's like the Jetsons is still almost state of the art in terms of what we would envision a great 2030s to be like, and that is kind of bizarre. So I definitely appreciate that.

I also share your—I'm not a super fan, but I'm also a fan of the Three Body Problem. And one of the early prompts that I tried with GPT-4 early back in the red team program, like a year and a half ago now, was asking it to write some hard science fiction in the style of the Three Body Problem about AI for the diffusion model for proteins. And I took the plan right off of a GitHub page for this protein diffusion model project, which basically said, we want to create text to protein. So you say, or text to maybe it was more even general than that, molecule or whatever. So you would be able to just specify in natural language—specify is kind of an odd word—or to the best of your ability, articulate in natural language what you're looking for in the protein, and this thing would then generate it. And we are actually starting to see that. There was a paper in Nature recently, I'm hoping to do an episode with the authors that achieves that to a certain degree.

But what the AI, what GPT-4 came back with in terms of hard science fiction about this scenario was, I think, first of all, just extremely funny because it basically ends up in a prompting war between the good guys and the bad guys, and they're both trying to outprompt each other. And so the climactic scene is the person prompting an AI to make a protein or a molecule that will interfere with the bad guy's molecule but not harm any of the humans or whatever. And it's just both absurd, but also maybe not entirely absurd. I mean, I am with you in that I would order the risks the same way. We already have ChaosGPT. There are—I recently read a research grant from a group proposing to study omnicidal tendencies. There are people out there who want to kill everyone. Like, what's up with that? And if the tools get more powerful, those people become even more problematic than they already are. So, yes, I would put that at the top of the stack of big picture risks. And by the way, I take all the short-term and medium-term risks seriously too. Like, this is a big tent show where all your hopes, dreams, and concerns and perhaps irrational fears can all have a home.

But I guess to sort of get to P(doom) equals zero, I still am like, I don't know. All these individual crazy scenarios, sure, they're extremely unlikely. The prompting war with your protein diffusion model is absurd on the face of it, but I kind of think of just taking the integral over that vast space of crazy, super unlikely scenarios, and then I'm kind of like, there's so many of them, right? That space is so big, and even if the probability is kind of vanishing, one thing you learn in calculus is you can—the integral can either also vanish or it can be finite over these, even if the function itself is going to zero, the integral doesn't necessarily have to go to zero over that space. So to me, that just feels like very unresolved still, and I don't think we're gonna resolve that today.

But I would love to hear a little bit more about how you think about AI agency and also concepts of emergence. In agents today, I guess I also wonder, like, is You.com gonna push more toward the agent direction? You've got what I would call a research agent today. You've got a browser as well. Could it, should I start to expect it to take actions for me? What I've observed in the agent space is I never feel like it fails because it doesn't understand the goal or doesn't stay on task. Doesn't mean that never happens, but very rarely. Much more often, it's just a failure of competence. So my expectation then is that as the competence improves, it may not be intrinsic agency, but it may be prompted agency, and it may even be, as we have more and more orchestrated systems, we may have models prompting other models to go off and do this. And it does feel like we've got—we're headed for a lot of spinning plates, and the idea that they could kind of all come crashing down doesn't just doesn't feel like something we can rule out. But I don't know. Can you help me be confident there? I'm still not.

Richard Socher: (1:11:09)

I'll go through some of the things you mentioned. So P(doom) equals zero—you're right. As you integrate over the future, I would like to not rule out anything. So maybe I should say 10 to the minus whatever, a tiny, tiny number, because in the next 5 billion years, all kinds of things can happen, right? Like, maybe as in the Three Body Problem—spoiler alert—maybe some big, much more sophisticated alien species will come across. They have already developed faster than light time travel or just are really, really fast in getting here in various capacities. And then they have an AI and that AI will just destroy all of us so they're getting ready to settle into their new planet before they even get here. There's all kinds of crazy things that can happen. It's just that, in terms of how much resources we should spend on P(doom) of existential doom versus, I'd say, yeah, have a couple of researchers keep thinking of cool sci-fi scenarios, inspire us, maybe think about ways that that could be prevented, but to spend billions of dollars on it, to spend a lot of mindshare of the public about it, who's already scared of any kind of technology. I mean, people are scared of Wi-Fi. I mean, there's this great Twitter handle called the Pessimist Archive. I mean, people were scared and thought doom is happening because of novels back in the day. People are like, all these kids, they're just in their heads reading novels. They're gonna all be useless human beings in the future. Newspaper was terrible. Internet was terrible. There's so many things that people thought this is the end of civilization and were very pessimistic about. And again, not diminishing real, real concerns, but again, existential ones, very unlikely given what we're seeing right now.

And if there it does happen at some point in the future, then I would argue that to think about the best countermeasures now is kind of like thinking about the best countermeasures against a computer going crazy when there's still a bunch of vacuum tubes and you're like, well, we're gonna just suck out the air of everything and then the vacuum tubes are not gonna work as well anymore because they're gonna break and blah blah blah. That was your counterattack against the computer taking over with your current thinking of vacuum tube computers. Or it's like the Internet. If you thought about what's the Internet going to be, how could it be so terrible, zero of the TCP/IP experts in the early ARPANET days realized that at some point, maybe a foreign power could interfere with local elections because you can say whatever you want online and maybe people get followers and there's social media. No one had predicted that in the 70s and the early ARPANET days. And so I think most of the threat vectors are not that useful in terms of research. Have a couple of folks work on it, but not take up as much mind space and scare laypeople and non-experts even more about the technology that even without consciousness is going to have major disruption, right?

If you're going through a new step function in human productivity, just like agriculture versus hunting and gathering and the steam engine and electricity and internet, this one's gonna be even bigger. It's gonna disrupt and change the job landscape a lot. I think at the end of it, we'll be way more productive. There's gonna be way more productivity per person and hence more wealth and new jobs will come around as old jobs get automated. But that is already so massively disruptive still, and it's not gonna happen overnight either. People think, oh, it's gonna be immediate. So yes, it will be faster, but still not overnight. I mean, there's still companies that aren't even on the cloud. There are some stretches in the United States and even Germany that don't have full Internet connectivity. So things will take time and not happen overnight, but they will be happening even faster than past technological revolutions.

And then your point about LMs and proteins is a great example for where regulation makes sense. Basically the concern here, for those who don't know, is that proteins govern everything in life and disease and sickness, COVID, SARS-CoV-2, everything is governed by proteins. So if you have a great understanding of proteins, we can build fantastical, amazing things. Here's just one example, a research paper I read a few months ago that just blew my mind and made me very excited about the future. There's this group of researchers that built these carbon nanotubes, and on one side of the carbon nanotubes, they put iron molecules. And on the other side of these tiny carbon nanotubes, they put protein that would only bind to a brain cancer cell. And then they injected this fluid with all these little carbon nanotubes into a mouse brain that had brain cancer. The proteins found the brain tumor cells and only connected to those specific types of brain cancer cells. And then they put the mouse into a little magnetic field and with the iron molecule on the other side of the carbon nanotube, it started spinning around and had nanosurgery on each brain cancer cell. Now, if you think about, if we have the full control of the proteins, we can connect them to all kinds of things and you find ways to get rid of the carbon nanotubes afterwards. Medicine is going to change in so many positive ways.

And now you could argue, but proteins, people could use them and build very bad viruses. And I'm like, that's true. And that can be outlawed. In fact, the US just a couple of months ago outlawed gain of function research where you—some researchers want to make even more deadlier viruses, and it's not because they're evil scientists who want to destroy the world. It's just they're saying, well, if we know how they work before they appear in nature by themselves, then we can already now develop cures for them. So it's a complex question, but you guys decided, for now it's not worth it, let's outlaw it. And likewise, I don't think an open source protein model is going to be the main deciding factor of being able to create some new virus. Because if you have all the wet lab experimentation to be able to create new kinds of viruses, you can also just do what Frances Arnold did when she won the Nobel Prize a couple years ago in chemistry, which is what she called directed evolution, but it was basically random permutations, and then running an experimental pipeline to see if that random permutation works better or not for a particular kind of protein. And then you just keep iterating like that. And so if you have those capabilities with random permutation, you can do bad things. But it turns out having a legitimate lab like that that can do all of that—so that was your P(doom) integral.

LM and proteins, AI agency, emergence. So obviously, emerging capabilities are incredible. That's sort of like, even us working in deep learning, I'm amazed, just like at You.com, I ask these questions, I'm like, wow, actually that is right. I would have not thought this was possible. Sometimes we're like, did you program it specifically for it to be able to answer these kinds of questions about headphones or something? We're like, no. It just put that all together just by trying to predict the next token. So I'm really excited. And one of the things I'm excited about was coding, and one of the things that coding enables is, now it's the last part of your question, actions. I think actions are clearly in the future, and for now we're focused on amazing answers, but it's not hard to imagine that at some point, the most amazing answer is done. I did what you asked to do, and instead of telling you how to do it, I just did it. Right?

You can build a really cool demo very quickly for these kinds of things, but the problem is, as much as I love natural language and as much as I love chatbots and everything, right, you have to find some really killer use cases for it. And to say, oh, I can book this flight. It's actually really hard to just book the flight. Like, to be like, why didn't you take this other one that was just not exactly the time I asked for, but I could have waited for half an hour at this extra leg and then saved $50. Like, that was really dumb. And it turns out Expedia and others have built for decades the perfect interface for that problem so that humans have all the information right there in a visual way. And so there's an uncanny valley where there's a cool tech demo on one side, and then there's my actual human assistant who, after months of talking to me, understands all the trade-offs and understands my price sensitivity or that of my company and knows when I would prefer and the only reasons why I might do an overnight red-eye flight and all the constraints, and she can do it. And even then, sometimes she's like, oh, Richard, there are three options here. Let me know which one you prefer out of this 5,000 that we filtered for you. And like, it's very hard to do all of that with just text. It's ultimately, I think, part of why we have the stock ticker app and so on at You.com, and why we have images now in some cases also, is that sometimes UI/UX and actual visually designed interfaces are best used in combination with language. Maybe one more...

Nathan Labenz: (1:20:41)

Big picture question, and then I want to do just a real quick lightning round on a couple more technical areas before we run out of time. On the big picture side, we've got Sam Altman out there saying AGI is coming soon, but also kind of confusingly saying, but it'll be less impactful than you might think. Not really sure how to interpret that. The median guess on some definition of AGI is just a few years on some prediction markets and more like 12 years or whatever for a stronger definition. What do you have—do you have sort of an expectation for and a definition or a threshold that you have in mind of, this is the threshold that really matters and, loosely speaking, what sort of timeline you would expect it to take to get there? It's very much the kind of question where you have to...

Richard Socher: (1:21:31)

You have to be very careful about your terminology because the interpretation of AGI has vastly different instantiations. Some people think of AGI as this superintelligence that's conscious, has self-awareness, can set its own goals, and is more intelligent than all human beings. That was, for a long time for a lot of us—at least for me personally—the definition. Now people have shifted, and I think it's partially because of marketing. You want to be working on AGI, but you also need to actually ship stuff. It's like you want to be a multiplanetary species, but you also need to just get a lot of stuff into orbit—you need more satellites and better internet connectivity and so on. So you have this long-term vision, and the best companies are able to articulate that long-term vision and then demonstrate revenue-generating progress in smaller milestones toward it.

In this case, I think the definition of AGI was pulled out, and then superintelligence was defined as even more than general—it's super, and that's the really long-term stuff. And now AGI is just basically automating boring jobs. If you define AGI—which I think is not crazy, it's a very pragmatic, investor-financial-economic definition of AGI—as 80% of jobs can be automated up to 80% effectiveness, and if that's achieved we call it AGI, turns out there's just a lot of jobs that are quite repetitive and don't require a ton of extremely novel out-of-the-box thinking that no one's ever done before, or learning very complex new behaviors and logic, identifying new experiments, collecting new data, and pushing the science forward and so on. There are just a lot of boring, repetitive jobs.

If your definition of AGI is just, "we can automate 80% of 80% of the jobs," then I think it's not crazy to assume—especially if you restrict it in one way, which is digitized jobs, jobs that are purely happening in your browser or on your computer—because those jobs can collect training data at massive scales. Turns out no one's collecting training data for plumbers, for roofers, for tire installers, for maids, or any of that. And so none of those jobs are going to get automated anytime soon because you first have to collect many years of such training data before you can then use AI to train on that and then automate it. But jobs that are fully digitized and that have a lot of training data that don't have a crazy long tail of special cases, they're going to get automated. And I think that's reasonable to say that's 80% of jobs.

My hunch is even in radiology, for instance, you could probably do 80%—find 80% of things that are wrong in a CT scan—but then there's still this very long tail of 20% that you just don't have enough training data for. Maybe a radiologist never sees it in their lifetime; they just read about it once in a book. And we're still not quite good enough at one-shot and zero-shot learning. Obviously, we've made huge amounts of progress, but not in super important things like radiology, where you just read about a case once in a book and then you identify it with 100% accuracy—which is also questionable whether humans do it.

I'm actually with you on the self-driving question. There's going to be a lot of interesting questions as AGI rolls into more and more workplaces, which is: how much better than a human does it have to be? And it's deeply philosophical very quickly, because if you're purely utilitarian, you could say, "Well, 20 million miles driven by AI results in 10,000 deaths, and the same amount of miles driven by humans results in five times more deaths, and so one is better than the other." But if that one dead person in the AI car was your daughter, you don't care. You're going to sue, you're going to try to end that company because they're responsible now for the death of your child. It's a very emotional thing; it's not a statistical thing anymore. And so there's going to be a lot of litigation as those cases come out.

And I think the silver lining is, again, of course, as the AI makes a mistake, you can learn from it, versus one person texting again on their cell phone—which is already illegal—and running over some kid that ran out, going too fast also, which is already illegal too. You can't really do that much more than making it illegal. AGI will have a huge amount of impact once it's just like, "Okay, repetitive jobs get to a large degree automated," and I'm with the people saying that will happen in the next few years.

When it comes to superintelligence that is fully conscious and can do all the things and is more intelligent than not just a single human, but all of humanity, it's very hard to know because no one's working on it and making progress along the lines of setting my own goals. And again, unless you set your own goals, I don't know if I would attribute full-on superintelligence to you ever. If your objective function is just to minimize cross-entropy errors or reduce perplexity or segment images well or whatever, none of that I would attribute to true superintelligence.

Nathan Labenz: (1:26:53)

Do you have time for a lightning round, or do we need to leave it there?

Richard Socher: (1:27:39)

Let's try to do a lightning round. Alright.

Nathan Labenz: (1:27:39)

Thinking also about retrieval, memory, and online learning as kind of three frontiers that you.com could improve on if there are research breakthroughs, but also these do seem to be kind of ingredients toward this bigger picture of AGI or even at some point ASI. I guess I'll maybe just leave it open-ended. What are you excited about in those domains? Are there research directions? Are there papers you've already seen or things you think people should be doing that you think will provide meaningful unlocks as we find new and better ways to do those things?

Richard Socher: (1:27:39)

Yeah, so I'm a fan of all three, of course. I'll try to keep it short. Retrieval is awesome. I think, in some ways, short-term memory is currently in the prompt. Retrieval is in the RAG. We've got retrieval-augmented generation—we do it over the whole web. We let you upload files now too and do it over a file. And then we have the smart personalization that actually is online learning. So as you say certain things, it will remember them about you, and then you can turn it off also. It's very transparent and you can turn the whole thing off or the automated smart learning about you if you don't want it.

But yeah, I think that's a simple, pragmatic way of online learning. I think ultimately, it'll be awesome to have AI systems get better and better at just adapting right away to user feedback, both in terms of thumbs up, thumbs down kinds of clicking, but also in conversation, like, "I didn't like that answer," and then updating the answer in a principled way for the future. I have so many more thoughts, but I'd love to do a second one. But these are kind of crazy days. With the Apple announcement, we just announced that Julien Chaumond, the CTO of Hugging Face, also just became an angel investor, and we've got a lot of exciting stuff happening. So yeah.

Nathan Labenz: (1:28:50)

Well, congratulations on the Apple thing and also on a new prominent angel investor and really some fantastic product progress. I definitely recommend everybody try out particularly Genius Mode and Research Mode, and I think if you do that, you will be coming back to you.com more and more often. So keep up the great work. For now, I will say Richard Socher, founder and CEO of you.com, thank you for being part of the Cognitive Revolution.

Richard Socher: (1:29:18)

Thank you so much.

Nathan Labenz: (1:29:18)

It is both energizing and enlightening to hear why people listen and learn what they value about the show. So please don't hesitate to reach out via email at tcr@turpentine.co, or you can DM me on the social media platform of your choice.

Infinite Code Context: AI Coding at Enterprise Scale w/ Blitzy CEO Brian Elliott & CTO Sid Pardeshi

The AI-Powered Biohub: Why Mark Zuckerberg & Priscilla Chan are Investing in Data, from Latent.Space

AI & The Law: Changing Practice, Claude Constitution, & New Rights, w/ Kevin & Alan of Scaling Laws

Amazing Answers: Richard Socher on how You.com is Reimagining Search with AI

Watch Episode Here

Video Description

Full Transcript

Transcript

Nathan Labenz

Read next