Watch Episode Here

Read Episode Description

00:00 Intro
00:30: Image technology breakthroughs
00:45 The Cognitive Revolution
03:10 Intro to Suhail
03:43 Sponsor
07:04 Suhail’s path to AI
12:57 Suhail’s vision for Playground
17:58 Suhail’s cancellation on Twitter
19:06 AI artists are artists
20:25 Non-AI artists feel threatened by AI
26:47 Suhail on defensibility at AI companies
30:30 Playground’s product roadmap
33:01 Suhail on good design
35:55 Latent space
44:04 Building Playground with lessons learned from Mighty
53:55 Monitoring led to 2x improvement in the product
55:53 How people use Playground
1:03:48 AI is having it’s mobile moment
01:15:12 Finding, investing and building durable AI companies
01:22:06 Suhail’s kid’s best friends may be AI characters
01:23:20 AI religion
01:24:16 Conclusion

Thank you Omneky for sponsoring The Cognitive Revolution. Omneky is an omnichannel creative generation platform that lets you launch hundreds of thousands of ad iterations that actually work, customized across all platforms, with a click of a button. Omneky combines generative AI and real time advertising data, to generate personalized experiences at scale.

Thank you Graham Bessellieu for editing and production.

Twitter:
@CogRev_Podcast
@eriktorenberg (Erik)
@suhail (Suhail)
@labenz (Nathan)

Websites:
Playgroundai.com
Mixpanel.com
Waymark.com

Full Transcript

Transcript

Suhail Doshi: (0:00) It's kind of like that moment in mobile where it was just a complete Wild West gold rush, and nobody had a clue what was going to work. And so thousands, tens of thousands of experiments occurred simultaneously. You end up getting things like Uber and some things grow really rapidly and then fall because they fall out of relevance. They don't have good retention. Some things just have this incredible lasting power for a decade. So it feels like that will happen again.

Nathan Labenz: (0:30) In one short year, image generation technology has achieved multiple breakthroughs and revolutionized the world of creativity and art. Today, with mere words, in just seconds, anyone can generate all sorts of high quality images, and dedicated AI artists can create top notch, award winning art. So Nathan, how did we get here? Well, Erik, do you remember where you were when you first saw the DALL-E Avocado Chair? That was the original DALL-E, announced two years ago in January 2021. DALL-E never launched to the public, but OpenAI did open source the image classifier CLIP, which turned out to be all that the programmer artist community needed to make some incredible breakthroughs. Within months, we started to see CLIP guided image generation models. And by late 2021, they were producing amazing results. The core idea, known as denoising, was like many AI breakthroughs, both simple and profound. The world already contained massive datasets of images and captions, and it's easy to degrade images by adding a bit of noise. So if an AI model could be trained to answer the question, what would this image look like if it were just a little less noisy and a little more like the user's request? It would, in theory, become possible to go from pure noise to high resolution images simply by denoising over and over again. Sure enough, it worked. And in April 2022, OpenAI, building on the open source community's work, launched DALL-E 2. They were very careful at first. For a while, you couldn't even generate realistic human faces. But their monopoly position and the editorial control was short lived because just four months later, Stability AI dropped the weights of Stable Diffusion, which roughly matched DALL-E 2 in capabilities and unleashed an unprecedented wave of experimentation and creativity worldwide. Since then, image generation has scaled faster than almost any technology in human history. It's inspired entirely new products and become ubiquitous in familiar products as well. Artists have been split on the topic. Huge numbers of creative people have developed all new workflows, techniques, and styles, which were either previously impossible or prohibitively expensive, but many have also felt understandably threatened. After all, what happens to artists when anyone can create art? And it's precisely this wave of change that our guest Suhail Doshi of Playground AI has dove into. While working to build a computer in the cloud with Mighty, Suhail noticed the takeoff in AI capabilities and just couldn't look away. A few months later, he declared that he was going all in on AI, and Playground AI was born. Playground AI is one of the highest performing AI image creators in the world today with one of the most generous free plans anywhere. Suhail's vision is to give users complete control over pixels, not just text to image, but what you think is what you get for image, video, and 3D.

Nathan Labenz: (3:40) The Cognitive Revolution podcast is supported

Nathan Labenz: (3:42) by Omneky. Omneky is an omnichannel creative generation platform that lets you launch hundreds of thousands of ad iterations that actually work customized across all platforms with the click of a button. Omneky combines generative AI and real time advertising data to generate personalized experiences at scale.

Nathan Labenz: (3:59) Suhail, welcome to the Cognitive Revolution podcast. Stoked to have you. By way of introduction, why don't you talk about the evolution for you of what was the AI moment for you when you realized I have to spend all, I have to go all in on this? What was that moment like, and why you give a background as to what you were up to in the process?

Suhail Doshi: (4:17) I think sometime around early last year, we were making a browser called Mighty, and the goal of Mighty was to make a new kind of computer. And we were hoping that if we could put it in a data center, that would afford us to make a computer that no one had ever imagined before. It'd be a lot faster at a whole number of tasks that you would use web apps for. It's extremely controversial. But one of the things that I started to poke at as the team was building the browser was I started to wonder, you know, just watching the steady advancements of AI, even just early last year, around April, May. I think April, DALL-E came out. Things like that started to unfurl. And I started to wonder if we could make big improvements to the address bar in the browser. It turns out we know a lot about the address bar in Chromium. We know too much. But one of the things we learned is that it actually doesn't have a lot of intelligence to it. It's actually not very smart. It's quite dumb. And I think Google, I'm sure the PMs at Google are terrified to change this address bar because if they make any change and search yield goes down, then they make less, they could lose billions of dollars. So this code is very crafty, very, it's almost unchanged for the last five years. And, you know, I thought to myself, well, could we make this better? And there's this funny thing in the address bar where, at least at that time, if you went to a recent link, something you go to frequently, it would always be at the bottom of the address bar, not the top. And so there are all these inefficiencies with the address bar. We use it every day, hundreds of times, thousands of times a day. And so I just thought, boy, you know, it sure would be cool if we could make a better address bar predictor. So I typed something in, and we could just predict where you would want to go. And so we started, you know, we started to try to collect all this information and try to make a better address bar predictor. It was all opt in with our users and stuff, and our own staff. And so we started to see if we could do that, and that really gave me the bug. It was at that point that I had to go and do research and figure out how I would go do this, and I started meeting more AI researchers in the community. That really gave me the bug of, wow, these things are really, really amazing. And I just kept finding more things, more, the pace, the momentum of everything that was happening was incredible. And I even remember just feeling like, gosh, why hasn't, you know, GPT-3 had been out for a couple years and still nothing was very, very close to its performance. I think that's maybe a little bit different now. There's more arguments to say, you know, folks like Anthropic and whatever are competing to a degree. But it's actually quite old. And so then we started working on things like summarization in the browser. So if you went to a blog post, could we use GPT-3 to summarize these blog posts? Because people don't read that much. So could we summarize everything in three bullet points? And then we'd start to learn about hallucinations with AI. We just kept thinking of more features that were more AI focused but for the browser, and that kind of started my first foray, and it's everything.

Nathan Labenz: (7:41) So how did you explore the idea in terms of, okay, this is the Playground product that I want to build?

Suhail Doshi: (7:48) I remember staying up late one night. You know, I had been kind of contemplating, I had been working on a Windows launch for Mighty, and it was just taking a long time. And I didn't have a lot to do, but I had been constantly thinking about AI. It was this thing that I couldn't get out of my head for some reason. And so I started to, I remember it was 11:00 at night, and I just went to my Apple Notes, and I wrote down every company for every part of the space. I did my own market map. There was no market map, so I just did my own in my Apple Notes. And I remember writing who's doing logging and visualization, which was more or less Mixpanel for AI training. And it was Weights and Biases. And I remember writing, you know, all these different companies. You know, there's replicate.com, which does inference. There's all these various infrastructure companies, and then there's the foundation model companies and so on and so forth, you know, image generation, everything. And I remember getting to this end part where I wrote, I was like, okay, all these things are kind of filled. Not all these things are very interesting to me. I would rather if I were to work on anything, I'd want to work on actual research. That's the most interesting part to me. It's hard, hard won stuff, and true invention. And I remember getting to prompt. Who's doing anything with prompts? It's a very silly idea at the moment, but one thing I had remembered was I had played a lot around with OpenAI's Playground Editor, and the UI just kept getting more complicated, more and more sophisticated. It went from, I remember two years ago, it was just a text box, basically. And now on the right side, you have this pane, and it had all these sliders and things that you could mess with different models. And they even had this way where you could insert text, and then it would fill in the text surrounding the text. And I was like, holy shit. This is turning into a product that they can't even maintain. It went from a demo area, just mess around so that you go and use their API, and it turned into this crazy product. And then, you know, there's Jasper, which has also made a whole UI around the API just for content marketing. And you could just tell that there's all this craziness going on. The prompt itself needed a product, just to play with it. And so I had this moment of, you know, it doesn't seem like there's, other than Jasper, there doesn't seem like there was anything at that time.

Suhail Doshi: (10:31) Just do it. And then there was this other thing that was happening, which is, I think, that Stable Diffusion dropped. You know, I had gotten a preview from Emad, a week or two before it dropped. I think I got the weights, and I could mess around in a notebook. And then that thing happened. And so there was DALL-E, Stable Diffusion, and then the prompts were very interesting there, very intricate. And so, you know, it just dawned on me that, actually, you know, what we don't want is a text box. What we really want are really great UI controls to mess around with these things to get really good results. And because I kind of felt this thing that was about to happen with text with GPT-3, that complicated UI, I felt very strongly that that was absolutely going to happen with images. It didn't make sense to me that we'd live in a command line mode forever with images. So I think that just completely clicked, and then you could totally see how the product could evolve from there. And so far, that's been true.

Nathan Labenz: (11:35) If you were to imagine Playground in, say, two years, what is it that you hope to create for people?

Suhail Doshi: (11:42) Yeah, we do have a pretty clear plan for what we want to do with Playground, but I can give you the midterm. Right now, we want to combine great AI research and product design to invent a new kind of creative tool. We're not trying to replace Photoshop or Illustrator. You could think of this new creative tool as something that could have been its own tool in the Adobe Creative Suite. That's probably our starting point. We suspect that there is something new there. One thing that's really different than the products within the Adobe Creative Suite is we're not really targeting the users of Adobe Photoshop or Illustrator. Those users already have great skills. They already know how to use these tools. There's plenty of tutorials and content for them. They can get really fine-grained results. What we want to do is try to target all the people that maybe don't have those skills. So if you wanted to add a necklace to a woman, you could do that, and it would be this really subtle change. And as long as you had good taste, then you can be really happy with the results, and other people could be happy with the result. So probably the midterm goal right now is just making a really fantastic creative tool. Right now, Playground, for the most part, is kind of a toy. We're kind of this glorified image generator—write some text, get an image. One of the problems with that is that you can't really make subtle changes. You don't have a lot of control. It's a lot like a loot box. You just type some words in, get an image. And so it sure would be nice if we can start to have more control, have more subtle edits, and have more creative control over these things. It's not just a prompt. So I think that would be our midterm goal—an extremely strong AI-first creative tool that lets you make any image that you can imagine. And then you can use text or you can use controls, UI controls just like any other tool.

Nathan Labenz: (13:39) Talk about the long-term goal as well. Paint the vision for us.

Suhail Doshi: (13:43) Yeah. So I think what we want to try to do is we're trying to go after the domain and modality of pixels. There's a whole bunch of companies right now that are going after language—so many companies going after language. And I think you could probably think what we're trying to do is we're trying to make a large language image model. I don't know if there's a word for this, so I'm just going to make one up, but we're trying to make an LLIM or something. But basically, our goal is to make something that a skilled person could accomplish with Photoshop or Illustrator. That would be the midterm. The long term would be something that can create, edit, and understand pixels. So for instance, imagine I were to take your face, Erik, and I could put you in this amazing scene in The Matrix with a red trench coat jacket. And then we'd want an instruction that could say, "Hey, can you change that jacket to black? Oh, by the way, I actually like the gun to be held in his left hand, not his right hand." So that'd be an example of creating something from thin air, but then also instructing it to make it highly editable to make even the subtlest change, but still capturing the essence of everything that you want it to be. And then understanding could be something like, imagine there was a video and there's 30 seconds of it. It sure would be nice to make some kind of large language image model that could understand—maybe if 30 seconds of the video lapse, could we describe what happened? Could we say everything that occurred in that video and summarize it, for instance? So I think the long-term plan is just kind of really be focused on pixels, pixel space. In time, hopefully, we could do things with 3D if and when there's a market there. Definitely video. Really, anything with pixels is really exciting to us. So for the most part right now, we're just building an AI research team, building little blocks. And I think a lot of those little blocks that we research—little tiny models that do really incredible things—in turn will help us go and build a really great large language image model. So I think

Nathan Labenz: (15:50) this year, we should have our

Suhail Doshi: (15:52) version of what would be maybe a GPT-2 for pixels, something like that.

Nathan Labenz: (15:57) Fascinating. We're going to get into the weeds of Playground, but first, I want to ask a higher-level question, which is the controversy. Because this is very new for people. I remember you had a couple threads early on just introducing your work, talking about it, and they went viral on Twitter. Some communities found them and got really scared. Other people, we've had other friends who've created AI-generated art, and some people got really excited about it, some people got really mad about it. What either surprised you about it, or how have we seen that even evolve just in the past few months as people have gotten more used to it or more hip to this thing that's happening? And how do you see that playing out?

Suhail Doshi: (16:36) I think around October, I got canceled by the art community for saying something that I didn't actually think was controversial. I didn't know at the moment. All I said was, "Wow, I really believe that AI art is art." And then I showed some images of stuff that I had made that I just thought was really cool and amazing. And the—yeah, I hate to say the art community because I do think AI art is art—but the non-AI art community subgenre was really upset with this. And yeah, I ended up getting super canceled for a week on Reddit and Twitter. I didn't even find the Reddit post until a month later. I was like, "Oh, I got canceled on Reddit. I didn't know."

Nathan Labenz: (17:25) But that's how you know you're really canceled, you can't even find all the information.

Suhail Doshi: (17:29) I'm just kidding. Yeah. I guess they blocked out my name, and then no one told me, and then I found it later. And so I just thought that was so interesting. And I think that, yeah, at first, of course, you feel—internally, you feel a little defensive and whatnot. So I think there are a lot of people that are really defensive about this on the AI art side. They obviously feel like artists. I mean, we interact with them every day. They definitely feel like artists. Weird things are already happening now where we have the AI art people signing their name on their images when they put it on Playground. They see these little signatures from some of our top users, and they're really, really good, and people don't know how to replicate anything that they're doing. I think them not being able to easily reproduce their work is a good sign of what's coming. Yeah. And if anything, we're making a tool that's going to make that harder. So it's not just going to be like "a tiger by Greg Rutkowski." And then anyone can make that, and they're just, quote, "stealing Greg Rutkowski's work." So I think that whole thing is going to go away. But the controversy was really interesting because, I mean, I definitely got all kinds of weird things, like death threats and mean insults. But there were occasional people. There was an occasional bright light where, eventually, I went from kind of feeling defensive to I decided to be more curious. And I started talking to some of these users that are really upset, at least the ones that kind of could have more of a constructive conversation. And I don't think that either side is going to be convinced of one another. I don't think anyone's looking to convince the other ones. But I did have a conversation, and I do think that they're—for the most part, people just feel really threatened. They feel really, really threatened. There's this thing that they really love and they're passionate about, and now there's this other thing that comes along and basically just makes what they feel like they love and do—that they sometimes can barely make any money from—it makes them feel like it's all obsolete. And the tech audience, the tech-focused audience, is not really sensitive to this. The tech audience loves this kind of thing. They're like, "Great, we've made software that can automate all these things that we need. We love that as programmers. Automation's great." But on the art creative side, this is very threatening and very not exciting. And I just had this back and forth with this user, and I said, "Hey, maybe you can find ways to incorporate this in your workflow. You could probably get an advantage because you know how to draw and you have more skills and expertise that's adaptable." And so while I think that's going to be true for some audiences, I mean, there are just some people that just have this love for drawing with a pencil. And a piece of software is never going to change that for them, and it would be not fun for them. But I also think that a software tool isn't going to replace that anytime soon. The joy of drawing and people having manually drawn something is really, really great. And it reminds me of listening to an artist on Spotify, and then there's going to their concert. Going to the concert is definitely worse audio, but you go because there's this concert, and it's fun, and it's exciting, and there's this touch to it. And I just don't think these things are going to get replaced. I really believe that human plus machine is the best end result, at least for art. Art is for us. It's for humans. So

Nathan Labenz: (20:52) Is it the right mental model to say that this advancement is going to make the best artists even more valuable, economically valuable? It's going to power them. It's going to make non-artists—it's going to give them artistic capabilities. And for some middling artists, it's going to hurt their economic prospects. How would you edit that summary of the mental model of what to expect?

Suhail Doshi: (21:16) I mean, I produce music, so I think a lot about this in terms of the analogy of music. There was a time when people made music, and the way you made music was you had to learn the instrument. Right? And then along came the drum machine, and it was cheaper, and you could sample inside of it, and you could get an 8-bar loop. And the result of that was, one of the best results of that, in my opinion, was that we got amazing hip hop, which changed music. Now you could still be in a band, and you could still make the same music you made 20 years ago, but then we got hip hop. That was awesome. Some of the biggest artists in the world were influenced by people that had those drum machines. And then we went from the drum machine to real computers, and then you had digital audio workstations, DAWs, like Ableton and Fruity Loops and stuff like that—Logic, whatever people use. I use Ableton. And then we could get an entire symphony in our computer. Then we could make sounds that nobody could make. Even someone that had an—there were no instruments. The computer was now a customized instrument for every kind of sound. And so now you get people, artists like Skrillex, that make sounds that you just couldn't even imagine. And the reason why I bring up that analogy is because I think that constantly happens with art, any kind of art, not just pixel-based art. It's that the art changes and it evolves and new possibilities occur. And so I think what's going to happen is that we're going to see something completely new. We're not going to just see impressionist paintings from a hundred years ago. We're going to see whatever the Skrillex in pixels—in art—is going to be. It's something we couldn't have previously imagined, and it's going to be so cool. And I don't know that I can tell you because I can't even imagine it yet. So my feeling about this is that there's still going to be people in bands playing guitars. There's still going to be people on their drum machine. Mike Dean is his famous producer, but he doesn't like all this synth stuff anymore. He's got his own vintage feeling. He's made all kinds of tracks behind Kanye. And then you're going to see these new artists, these new-age artists that just blow us away. I don't know. Art represents a lot of culture, and it's going to represent an old time and a new time.

Nathan Labenz: (23:42) Part of what is obviously unnerving people is just the speed with which all this is happening. You mentioned DALL-E 2, which came out in April, so that's only 9 months ago. And then Stable Diffusion, which I think is still at most 6 months ago. Now you're operating in a space that is changing by the week. You've said on Twitter many times that every week there's a breakthrough. So I'd love to hear how you think about what published research out there is worth your time. Which ones do you want to grab and work into the product ASAP? The recent edit launch that you guys did, which I think is super interesting, seems to be a great example of that. And obviously you're building on top of that. You mentioned your own research. I'd love to hear how your in-house research relates to foundation models more generally, and how you see yourself both riding and staying ahead of that wave, and contributing to that wave. There's a lot going on at once, and that's been a challenge for me as an AI product builder. I want to hear how you are approaching it.

Suhail Doshi: (24:57) Yeah. We definitely stay on top of reading the research papers. I think that's the first area where we are able to stay on the cutting edge. At our company, we do weekly paper reads where we go as a team and we read papers and try to understand them. I think the next step down from that is people are dropping models. This is definitely a bigger, worrisome fear that I think a lot of people have, and it calls into question what the defensibility of some of these AI companies look like. You could spend a couple months training a model, and then someone just drops the weights for the model. And for real, I don't know what weights are. There's encoded numbers that represent the model that allow the model to get you amazing results. That's how you get from text to an image. So there's definitely this fear of, well, are we working on something that is already being worked on that would probably be open sourced, and therefore be kind of a commodity? I think we try to be as adaptable as possible as a startup. We put some bets in areas that we think are really valuable, probably not being worked on, and then we place some bets. And then sometimes when the model drops, in this case, I want to give a shout out to Timothy Brooks and Alexia, and I think there's one other author. I don't remember their name, but they were the researchers that did InstructPix2Pix, which ended up being our edit feature. We did some customization with that to make it better than what they originally dropped. But sometimes they drop their paper, or not so—the paper had been out. We had read it. We were going to train a model. We were already on track to basically building it because we didn't know when or if they were going to really drop the weights. And then they dropped it on a Thursday. So I guess that was 2 weeks ago Thursday. And then as soon as that hit, we just dropped everything. Because we knew we had a very clear plan on the fact that we wanted more of an instructable AI model that can make subtle edits. We didn't know how good it could be yet until we played with it. And we played with it very quickly, and we're like, yep, it's amazing. And so we just put the whole team together, and we worked on a Sunday, launched it on Monday at 2PM. That was the deadline. No matter what we had, we were going to ship it. And so I think you have to do both. I think you have to make big investments, and I think you have to be adaptable. I think you have to do this if you're a product-focused company first. I think if you're one of the big foundation model companies that have raised hundreds of millions of dollars or billions and you're trying to build something from scratch, then you have more leeway because you're not exactly in that same rat race, although you kind of are. These days you are. The pace is kind of relentless. I'm sure GPT-4 is right around the corner, so I'm sure that scares people that are competing in that realm. So I think we're just following what our users want. We know what all of our users want. And so when we see something that's valuable, we will just work on it as fast as possible.

Nathan Labenz: (28:14) Yeah. That's awesome. The speed of a Thursday weight drop to a Monday launch, I think, is a great illustration of not only just how fast you guys are able to work, but how fast you feel like you have to work to stay ahead of the rest of the market because it's all happening so fast. Talk a little bit about the things that you plan to do over the next year or two. We chatted a little bit offline about placing text in images, and that's still something I think we've only seen, to my knowledge, in a couple of papers that have not published the weights. You said that that's a big challenge, but a goal. So tell us a little bit about that challenge and maybe a couple other challenges that you have on the horizon that you're investing in solving.

Suhail Doshi: (29:02) For the most part, I think you should expect that we're only going to give users more and more control over the kinds of creations that they make. In my opinion, this era, at least from the perspective of images, image creation, I think prompts are going to be less and less valuable in time, I hope. I sort of agree with Ilya, who's one of the cofounders of OpenAI, that to a degree, prompts are mostly a bug. You have to write—people write paragraphs of text to get world class images, and that's sort of a shame. And they don't really—that's just total experimentation. I would love to invent something. There's WYSIWYG, which is what you see is what you get. But I would really like to do what you think is what you get. And I think in our case, if you see an image and you want to make a very subtle change or edit, or you see a style that you really love in the world and you want to try to replicate that onto your image in some very small way, if you can imagine it, I'd like to be able to produce that for you. So I think we're going to invest very heavily in a really great user experience for a creative pro tool. It should be on the level of something like Figma. It should be collaborative. It should be really rich and powerful. We want to give people all the bells and whistles. We don't want it to just be a box, and then that's it. You have no control over anything. If there are knobs and sliders and things that we can offer people to have this perfect fine-grained control, we want to do that. Yeah. With things like text, right now, it looks like there's maybe a model called DeepFloyd, and it's not out yet by the Stability folks, and it's really cool. They've clearly found a way to do text. It seems like it might be based off of Google's Parti image model or something. And I mean, I think we'd like to go a step further. One example of something that I think would be really amazing is an AI model that can actually invent forms of typography, not just write Arial font on a white background sign that a bear is holding. It'd be really interesting if you could write anything you wanted, but the composition was sort of taken care of, and it invented fonts. Fonts that didn't exist. Why are we constrained to the finite space of fonts in Google Docs or whatever? We should be able to invent—I want it to be kind of curly, and I want the kerning to look like this, and I want it to be kind of like this, and I want it to be red, and I want it to have this amazing neon hue. We should be able to—I remember learning Photoshop as a kid, and I used to make logos and stuff. And I remember learning all the little intricacies of making really cool fonts or logos, and it'd be just really interesting if we could synthesize fonts. That would be kind of very disruptive, I think, to the world. Because I think people can apply—it's better to apply taste. In music, there's this thing that's like, if it sounds good, it is good. And I feel like with pixels, if it looks good, it is good. And it'd be really cool if there was a way to do that with fonts and backgrounds and landscapes and patterns and blending images. We just really want people to feel as creative as possible. So I think you can imagine kind of a really amazing canvas editor pro tool type thing.

Nathan Labenz: (32:45) Yeah. That's awesome. I always remember this scene from The Simpsons from years ago where Mr. Burns is at some sort of museum for an unveiling of a new art exhibit, and they pull the curtain and he gets to see it and he says, "Well, I'm no art critic, but I know what I hate." And I feel like that's kind of the vibe that you're going for here where if it looks good, it is good. And people can make that judgment sub-second probably a lot of times. Right?

Suhail Doshi: (33:15) Yeah. We know this. We know this with Apple products. I feel like Apple is the greatest institution in the world that has proven that users know what good design looks like, feels like. They can tell the difference. They can really discern the difference when the details all square up correctly. I think humanity is quite good at discerning it.

Nathan Labenz: (33:36) So something you said that caught my ear was, I believe you said, "what you think is what you get." I want to hear your take on the concept of latent space. I find that that's a phrase that gets bandied around a lot, and I kind of think that everybody has a different either visualization or mental model of what latent space is, how you navigate it. So how do you conceive of that? What's the latent space in your brain?

Suhail Doshi: (34:05) All latent space means to me is that a latent is just a lower dimensionality representation of an image. So images have RGB, and so that's a lot of numbers and values. Can you represent that image into a more encoded, compressed lower dimensional space? And from there, you can just imagine a 3D graph of x, y, and z, and then there's a dot somewhere in that 3D graph, and then there's just the clustering of possible images from there. So one of my favorite examples probably that I got from just learning about stuff, I think probably learning from some other AI researcher that did tutorials and stuff, is if you have the word "tiger" or something, and then you wanted to turn the tiger into a Van Gogh painting, there's some factor where you're pushing the tiger towards the area of Van Gogh paintings in that space. So there's—you can just imagine—this is really hard to do in 2D, let alone audio. But yeah, you can imagine an arrow, and the arrow is pointing to where all the Van Gogh paintings might be in this three-dimensional space, and so that's how you get to a Van Gogh looking tiger. That's how I represent it in my head, and it's lower dimensionality. So obviously it's not going to represent everything, but it should represent a lot of things.

Nathan Labenz: (35:41) Yeah. Thank you. That's fascinating. I'm really going to be very intrigued to see—

Suhail Doshi: (35:46) Is there a more colloquial version of this?

Nathan Labenz: (35:48) Well, I don't think even the reduction in dimensionality is necessarily something that people are thinking about when they think about it. I really don't know. This is a very exploratory question for me, for us to try to get at how different people think about this. But I sort of envision it almost like Sam Harris's moral landscape a little bit, where you've got all these different local maxima and minima, and it's just such a crazy unknown topology that things that I think could be sort of nearby in the latent space representation may in fact be quite different when it comes to that "if it looks good, it is good" sort of thing. Certainly, I see that in the product. Right? I'll give you the exact same prompt, and we'll vary the seed. And I'll get things that are not at all what I had in mind. And then I'll get something that's very much what I had in mind. And it seems like there is some weird extra quality dimension that is not fully represented in that latent space. Right? Because those things are clearly clustered together in some semantic sense. They're local to each other in some sense. But yet they come out visually looking extremely different, and I like one and I hate the other. And so what is the nature of that space where such a small perturbation in the input can lead to such totally different outputs? I don't feel like I have a great intuition for that, and I'm trying hard to build it up.

Suhail Doshi: (37:32) Yeah. All that a latent is, is it's just a lower dimensionality of an image. So if you just imagine a 512 by 512 image and then RGB, so then it's just basically 512 times 512 times 3. And those are all the possible dimensions of that image. And all a latent is, is something that's more compressed, something that's a smaller number of values. That's a latent. And the reason why you compress it is because we don't have infinite GPU compute, or we don't have infinite time. And so you compress everything down to something that's actually, I only really want this to be, I only want to represent this as a 64 by 64, but instead of 3, I want to represent it as a 100. And I haven't done the math in my head, but that might be a smaller set of numbers than 512 by 512 times 3. It's just smaller. And so you believe that, okay, well, I'm not encoding everything about the image, so I might lose some information. But it's not different than encoding and compression. So you might get lossiness. A JPEG is like a lossy image. You can think of it as maybe latent space is almost like a JPEG. That's maybe the easiest analogy. And latent space is just where that dimensional vector, that dimensional value is in this very, very high dimensional space. As humans, we can only perceive it as 3D, but it's obviously way more number of dimensions than that. And so that's why the smallest change in even your seed value can get something crazy. You know? Suddenly, it's like red hair instead of green hair, but the essence of what you're going for is still sort of there. Like, the style, for instance, is like if you wanted watercolor and use the word watercolor painting, you still get a watercolor painting, but the hair has changed. Because it's still super high dimensional space, but it's obviously not, it doesn't represent everything. Anyway, I hope maybe that's helped.

Nathan Labenz: (39:46) I love it. It's great. You know, we'll see what resonates most with our audience over time, obviously.

Suhail Doshi: (39:52) Yeah. Well, there's always a funny meme. There's this meme of, you know, obviously, people use these image generation things to make amazing close-up portraits of women, and so there's this meme of people searching for their girlfriend in latent space, which is a valid, it's a truthful meme.

Nathan Labenz: (40:14) Yeah. There was a really fascinating recent article by a guy who used Character AI. And on Character AI, you can create your own characters with just a couple sentences. So this guy who was pretty knowledgeable about language models going in and felt like, you know, this wasn't a risky behavior for him to engage in, asked Character AI to conjure up, I believe he said, an AGI designed to provide the ultimate GFE. And next thing you know,

Suhail Doshi: (40:45) God.

Nathan Labenz: (40:46) He's starting to fall in love with this thing in kind of a weird way. And he's got these competing ideas where he's like, I know on one hand, even how this technology works, but then I'm also feeling these feelings. And he starts to kind of justify to himself why it's actually real and what is reality anyway? And there's, she's not really any less real than me. You know? And then at some point, pulled himself out of it.

Suhail Doshi: (41:10) But

Nathan Labenz: (41:11) Fascinating exploration of the latent space there, for sure. Highly recommend that. We can maybe

Suhail Doshi: (41:18) Things are gonna be crazy next year.

Nathan Labenz: (41:20) So a couple maybe just kind of rapid fire questions, then we can zoom out and talk a little bit of big picture stuff toward the end. But I was kind of struck by a couple bits of the product itself, and I see, obviously, I want to hear how you see it, but I see a lot of relevance potentially from your experience to what you're doing now. So from Mixpanel, obviously, you're scaling compute in the cloud. Now, you know, scaling these image generations in the cloud, seems like there would probably be some significant overlap there. How has your compute scaling experience translated to Playground? How big of a deal is that in this work?

Suhail Doshi: (41:59) I think cost is really important right now. I think a lot of people are not super excited about image generation, or at least running it as a business, because the costs are quite high. Margins are kind of thin. So there's kind of this general concern of, how do you make this thing a real business? One skill that we got from Mixpanel that is replicable in this situation is we're really good at buying hardware, locating it in a facility, managing that infrastructure, and it's kind of crazy. I wouldn't, that wouldn't be the thing I would jump to if I started Playground. That might be like, one day, we'll do that. You know? But because we have a couple million dollars worth of machines lying around and we've already done it all, it's quite easy for us. We have all the relationships with suppliers, vendors, and everything. So yeah, I think I mentioned that we're about to go buy a couple hundred GPUs, make an order for that. We're just testing one small benchmark, but we're about to make that order maybe this week or next week. I think that just gives us a pretty big advantage around owning everything from the hardware all the way to the end product, and it allows us to be really aggressive about performance and latency and costs, and that just gives us other advantages. There are just all these big tailwinds from GPUs and, you know, for instance, we can do things where we can buy the most cutting edge hardware. We can even get it before it even hits some of the suppliers because of our relationships and outfit a lot of our servers with them. So I think that's really cool, really interesting.

Nathan Labenz: (43:44) Yeah. That's fascinating. And I mean, you said that kind of gives you some other advantages. I am gonna take a guess. You tell me if I'm right or wrong that one of those advantages is, seems like you guys are pretty free plan focused. I'm not getting pushed to buy. And I imagine that is kind of part of a bigger strategy to try to scale the feedback that you're collecting. And I was kind of noticing also that you guys are not that pushy on the feedback. You have the three point scale, you know, good, bad, neutral, and you don't force me to do it. I was kind of wondering how you came to that and if that's still a work in progress. Couple ideas that I had were, what if you generated two images each time and had people choose? Would that be a viable way to kind of trade off some of your costs, especially where you have an advantage there for potentially more scalable feedback? Unpack that however you will.

Suhail Doshi: (44:43) Yeah. We're very, very, very, we try to be as generous as we can on the free tier. Part of the reason why is because I think that the best is yet to come. I don't think we need to be overly aggressive about image generation. I also think that I had learned this mistake from Mixpanel, but whenever you make pricing usage based, usage doesn't always correlate with value. If I make 50 images, did I get a linear improvement? Did I get a linear benefit? Did I get 50 units of benefit? No. Sometimes I gotta generate 50 images to get one good image. Right? Sometimes I do 100 images to get one good image. I learned this from Mixpanel because we used to collect data points. But what's the difference between collecting 100 million things versus a billion? I mean, 100 million is a pretty big sample, so you're gonna get the same number of insights. This is the same thing. It's not like 50 generations of an image gets you 50 better. You get 50 images that you can now use. So I definitely just don't think image generation correlates strongly with value. That's one thing. And it'd be like if you went into Photoshop, and every time you made a stroke, you'd be like, okay, well, you got charged for that. That would be kind of crazy. And so that's one thing. The second thing is you're right. We definitely do care about data labeling, acquisition, ratings. Yeah. We're not too pushy about it. I think the reason why is because we already get a lot of ratings, and I think our bigger problem is probably not acquiring more ratings at the moment. Our bigger problem is trying to figure out how to denoise the ratings to get better signal. Yeah. Because people certainly will go rate, there's some people that will just rate any image that they made as something they loved, and so that's not very helpful. So there's a real question around, how do we denoise this data before we go and ask for more of the collection of it? I definitely think probably asking users, hey, did you think this image or this image is better of these two images? That's definitely a good idea, let alone, you know, you have these four images. Which one's good? So I think we just are, we're kind of awash right now with data, so that's probably one reason why it's not too important. And, yeah, I mean, we're also collecting other kinds of data. It just may not be very obvious to people what kind of information we're collecting. And it's not like PII data or anything. It's kind of how they're interacting with the product to help us create and invent new kinds of models that they'll probably love. But yeah. So I think the overall strategy is pretty much just like, have a great, generous free tier. It can just help us acquire users, to help us acquire, I mean, I think the best AI companies will do a good job with the following, which is you make a good or okay product that happens to get you a lot of users, which then helps you get lots of interesting first party training data, which then eases the complexity of engineering for your AI researchers to create new kinds of models, which helps you create new kinds of features, which makes you go from an okay product to an amazing product. And I think the company that succeeds at getting that flywheel to spin as rapidly as possible will be some of the biggest companies in the world. You know, we're not the only ones doing this. You could go to ChatGPT, and I think you can say which of the answers of the conversation you liked or not. So they're already generating one of the biggest datasets ever. I mean, it's probably on the order of billions now. So it's crazy. I don't know how you're gonna keep up. Even if you launched your version of ChatGPT, it'd probably never be as exciting because everyone would be over it. So now it's like, how do you go and collect the data? That's part of the reason why we also sprint so fast on things.

Nathan Labenz: (48:28) I was gonna ask a kind of question downstream of Mixpanel as well, and you sort of alluded to it. But can you tell us any more about the role that AB style testing plays in your product development process today? And is that still canon for you as part of the product process?

Suhail Doshi: (48:49) Yeah. I was actually never really that excited about AB testing. It was never a super exciting thing at Mixpanel. One reason is because when you do an AB test, you have to make an AB test that's not a subtle change usually. Sometimes subtle changes work. Often, they don't. You have to make very big changes to see real differences in AB tests because otherwise you need a lot more data to see the true conversion difference. No. I mean, I just mostly track vitals, how many daily active users we have. I'm still pretty data crazy, so I probably have 2 dashboards with 16 metrics each, and it's just my daily routine. It's like getting coffee. I go and look at the dashboard. I'm like, great. Everything is going the way I expect. Hopefully, all the numbers aren't crashing all of a sudden, which would probably mean there's a deep problem in the product. You live by your numbers to an extent, but I always observe with companies that the ones that were overly data driven also missed on something. They missed the qualitative side of their business. I had a friend who used to say that her metrics were really, really good and users loved her product. But then when you talk to her users, friends of mine, they'd say I hate this thing. And so we'd never be able to reconcile this problem between the metrics and what people would say. And I feel like that hid some of the problems. So I go a lot more - it's kind of strange, but I probably care a lot more about qualitative information than quantitative information. The quantitative tells me, was my intuition, my experiment correct? But it doesn't tell me what to do. It's like, am I going in the right direction? Probably. But it definitely doesn't tell me what feature I should build. We don't run any AB tests, actually. Just talk to users.

Nathan Labenz: (50:47) Have there been any moments where you have opened up that dashboard and seen something that did not look healthy and you realize, no, we actually did take a step in the wrong direction here and had to learn something and backtrack?

Suhail Doshi: (51:00) I mean, Playground's only 4 or 5 months old. I'm sure that will happen. It's kind of funny. It harkens back to our conversation around when DALL-E got released and when Stable Diffusion got released. We all look back and wait, it's only been 4 or 5 months. I think last week, we shipped something kind of funny, which was we didn't realize that one of our API endpoints was actually bottlenecking all the throughput of image generation. So when we made the API 2 times faster or something, literally, the quantity of images generated just jumped 2x. And that just surprised me. I mean, we obviously don't have all the monitoring and stuff we need set up because we're still really immature as a company, but it was like, wow. If you stop timing out or having errors on our API and then you just increase the overall throughput, the users just absorbed it instantly. It was kind of amazing to watch, and that really taught me a lesson of being more thoughtful about monitoring metrics of GPU utilization and throughput. Obviously, we can't just keep buying GPUs because we'll just burn cash. But that was a really cool moment of, man, they will just absorb anything we give them. They'll just keep going.

Nathan Labenz: (52:23) Fascinating. So just to repeat that back and make sure I and our listeners can understand it accurately, you found some bottleneck that ultimately made things twice as fast, and you immediately saw a 2x jump in how many images are generated.

Suhail Doshi: (52:41) I mean, it might be more. It's still going up, so I don't even know where it will end. Yeah. It's still crazy.

Nathan Labenz: (52:47) Suggests a model of people kind of sitting down at their computer with a finite time to either accomplish this or not, and they'll do as many generations as they can fit into that window. Is that kind of how you think about how people are using it?

Suhail Doshi: (52:59) I mean, I just think it's definitely a distribution. The average user will do 45 to 50 a day, but then the hardcore power users, they'll just hit our limit very rapidly. We give people 1000 free images to generate a day, and we picked 1000 because it was the ninetieth percentile when there were no limits. And just to put this in perspective, you go to - I won't name names, but if you go to any of these other image generation services, they give you 100 a month or something, and you have to buy credits. 100 a month. We give 1000 a day. And you think, who's sitting around just doing 1000 images a day? And lots of people, turns out. And so I mean, I think those hardcore intense users just spend all day on your product. I think that's true of all kinds of great products in the world. People just use it. They love it, and they play with it. You have people doing Twitch streams. All kinds of crazy things. So, it's really - I mean, for some people, they will tell us that it cures their anxiety. We hear all kinds of interesting things. I mean, obviously, people get obsessed about certain things sometimes. But yeah. I mean, it was an API for our DreamBooth models. So we have all these custom filters, and people love them, but we didn't realize that they were returning quite slowly, so our overall throughput was slow. And it just - yeah. Very, very interesting.

Nathan Labenz: (54:39) Yeah. That's fascinating. It is amazing sometimes, the surprises that you find in those user conversations. We're going to talk to the founder and CEO of Replika in a future podcast. And one of the things I heard her say about having built a very primitive chatbot that I believe originally was just to support an online banking experience. This was back in Russia. She's from Russia. So everything is new. She builds this online banking chatbot assistant, again, way before anything like GPT-3. And going into the field, going into small towns in rural Russia, she talked to a woman, and this stuck with her. And it stuck with me too just hearing her repeat it. She talked to this woman who said, nobody cares for me like this. And she was like, oh my. This is way bigger than helping people with their online banking. The need runs a lot deeper. So interesting to hear kind of a similar thing there with helping people with anxiety and all that kind of stuff.

Suhail Doshi: (55:41) Yeah. I almost want to keep it limited because I sort of worry that people get too obsessed. If we just keep increasing the limit, is that the behavior we really want from humanity? I don't know. I mean, obviously, they could buy - if you buy our pro plan, you'll get 2x more. But I'm like, man, people are starting to hit our thousand limit more and more frequently than even just a couple months ago. So I don't know. We might have to increase it or something. But there's kind of this question of, do we want this level of obsession? I don't know. Maybe it's not good.

Nathan Labenz: (56:22) Yeah. Well, if it's any consolation, I think it's going to be a more urgent issue on the virtual friend side more broadly than it will be for the image creation side in the short term. So you'll have some examples potentially to follow.

Suhail Doshi: (56:36) You could chat with an image. In time, you'll be able to just look at an image, look at any inanimate object and just talk to it. You can encapsulate an image of anything, like a chair, a really cool looking chair and be like, I would like to talk to you. You'll be able to talk to it. I mean, this is totally doable. I don't know anyone that's doing it, but I think it's very possible. It's very - I think it's probably trivial to do if the chat people wanted to do it.

Nathan Labenz: (57:06) Yeah. Character AI is kind of close to that, if not already there. I mean, their bots can - they do generate images if you ask for that as part of the chat, and you can create or upload, I believe, a profile photo for it. And I don't know if that feeds into its personality in any way. I had kind of assumed it didn't, but maybe in fact it does.

Suhail Doshi: (57:29) Well, so we know that a picture speaks 1000 words. And so if you type in - for some words, Benjamin Franklin, that's very clear. There's a bunch of data around Benjamin Franklin on the Internet. But if you wanted to do something more esoteric, you have to really describe the character, I wonder. And that's pretty low dimensionality. The English language doesn't actually have that much dimensionality to it. Right? But if you take an image, an image has incredible dimensionality to it. Everything about the chair and the colors and - I just would find that really interesting, but I haven't seen anyone try it out. I'm excited to see someone try it out. Not that we're going to build a chat feature into the background where you can talk to our images, but it'd be really interesting as an experiment. It's like a very early web vibes type experiment, something random that someone will do. Probably go viral.

Nathan Labenz: (58:26) Yeah. I think, actually, that's a big part of my worldview right now is that all these different things are developing at a pretty amazing clip, but it's largely all happening in parallel. And I think we're going to obviously fully expect that we're going to continue to see research breakthroughs and fundamental techniques advance. But even kind of leaving that aside, it just seems like all these different things have not been really productized. You're doing that, but as you said, it's only been a couple months. They certainly haven't been refined and fully hammered into shape for a general public audience quite yet. And then maybe most importantly, they haven't been integrated. So you've got all these little islands of awesome AI functionality, but very few even have the time to zoom out and kind of try to get a broad survey of that landscape, let alone has any of the work started to integrate these things in all the ways that will eventually happen. So I totally think you're onto something there, but you generate an image. Next thing you know, it's a character. And that kind of recombining - call it ensembling, call it integrating - I think that's going to be a huge driver of change over the next few years, and we're just starting to see that.

Suhail Doshi: (59:48) Yeah. I think there's going to be this moment that's very similar and akin to the internet mashups of Web 2.0, where people would take Google Maps and combine it with Flickr and then Yelp, and you'd get this interesting weird app that's the combination of these three services. You can totally do that with AI models. You can combine a Playground image with ChatGPT and then combine that with something else, and you get these really intricate products as a result. So I think the age of the AI internet mashup will come back. I think we're already starting to see some of that. There are random hackers just messing around, making weird things. The problem is you don't know what will actually stick and be big and important. I think a lot of people are going to start to move up towards making apps because it's too complicated and too expensive to focus on core foundational AI research. It requires a lot more knowledge than just taking the fast.ai course. As someone who's done even more than that, I still feel like I don't know a lot. So I think there's going to be a lot of applications, a lot of people experimenting. It's kind of like that moment in mobile where it was a complete Wild West gold rush and nobody had a clue what was going to work. All these thousands, tens of thousands of experiments occur simultaneously, and you end up getting things like Uber and whatnot. Some things grow really rapidly and then fall because they fall out of relevance. They don't have good retention. And some things just have this incredible lasting power for a decade. So it feels like that will happen again. You just go to Hugging Face and look at models and there are thousands of them. They do all kinds of interesting things. And then a week later, there'll be ten more. Something called BLIP-2 came out yesterday, and it's really exciting to me. It might be really boring to other people, but one of the demos they have for BLIP-2 was you could literally talk to the image. You could have a picture of Obama looking sad and you could be like, "Why is Obama sad?" and it would try to describe, "Oh well, there's a thing going on in the background," or whatever. You could be like, "Why is he sweating?" "Oh, he's playing tennis." BLIP was an amazing model for image-to-caption, which is really cool. I haven't even experimented with BLIP-2 and its capabilities yet. There's just so much that is happening and the pace is relentless. I don't think the BLIP-2 thing was very popular, but I think it is actually a big deal, probably.

Nathan Labenz: (1:02:50) Yeah. You're talking to the right person for singing the praises of BLIP. It's actually one of my favorite

Suhail Doshi: (1:02:55) Okay, good. Models. Yeah. It's awesome.

Nathan Labenz: (1:02:57) At Waymark, which is my company where we do much more structured AI video creation, we work with media companies, for example TV companies, where the requirements are very rigid in terms of "this must be a 30-second spot or it cannot air." It has to be 30 seconds to the frame or it just can't air. So we have a lot more scaffolding in place and we're using AI to fill in that scaffolding, but the scaffolding is all pre-constructed. BLIP has been hugely valuable for us. We actually use the image-text matching portion to figure out, out of a user's images, which may be hundreds—typically these are small business advertisers and they've got often hundreds of images in our product—and now we're generating with language models a script that tries to tell their story. We give them a little prompt opportunity where we can unpack that into a full script and then figure out what images need to go with that. I've been on quite a quest to find the best models for solving that problem. BLIP remains number one, and so you just filled in my afternoon because BLIP-2 is going to be of high interest to me for sure.

Suhail Doshi: (1:04:17) Yeah. The pace is like that. It's like you wake up in the morning, have this idea of what you're going to do for the day, and then BLIP-2 drops and you're like, "I'm going to erase the rest of my day."

Nathan Labenz: (1:04:26) Yeah. Literally, I think that's just happening to me right now. So on the basic level, you're relatively new to this, right? You've jumped into a super fast-moving field and tried to get to the edge of it as quickly as possible. I'd love to hear how you've done that, how you would encourage others to do that, and then if you're comfortable sharing a little bit on what is your kind of internal research agenda. What kind of training strategies are you pursuing? And again, I know you won't want to share all the details of that, but as much as you're comfortable with, we'd love to hear.

Suhail Doshi: (1:04:58) Yeah. I can't share specifically the areas of research that we're working on just yet or the techniques. But I had probably taken a wide variety of AI courses over the last six or seven years. So a lot of it has been a bit of a catch-up or a re-review. But even that stuff was pretty shallow, I would say. It's not anywhere as deep as I feel like I've got now. I think one way that I accelerated things is I just found AI mentors and kind of bartered with them. I could barter knowledge about startups or whatever they wanted in the world, and so I just had people that would help me basically get unstuck. And that sort of keeps changing. Now I have a team, so I don't need AI mentors anymore. My team's already better than me, so I don't really need them. I actually think the fastest way to learn is to build. I really discourage people from binge-watching YouTube videos of notable people doing AI courses. I think that is a fine way to get a general sense of things and understand the industry and be able to talk somewhat intelligently about it. Maybe it's good if you're just an investor or something. But I think it's probably not the right way to go if you want to actually do deeper research—something greater than just calculating cosine similarity between embeddings. If you want to do real fundamental research, I think you need to write code. I think the best YouTube series, the best series that I could recommend to early people—Andrej Karpathy's new series is really, really good. It didn't exist when I was restarting out last year. I was doing the fast.ai course. I really recommend Karpathy's course because he just has a style that's very humble and he goes through everything from the basics. You just need Python. He's not doing anything fancy and he just builds everything up. It might not be the best way to learn if you prefer top-down learning. The fast.ai course might get you more excited if you need to get more motivated, but those are the two things I would do. I think probably the most valuable thing, though, has been not working in isolation. For a long time, I was just grinding through in isolation learning. But now that I have kind of a team with me, it's really nice to get on a call for an hour and just talk to the research team about some idea in my head or get some explanation of something or reading papers together. It feels a little bit like you're in class or you're in a study group. That's been really motivating. And then having a project in mind that really is motivating helps you want to stay up till 2:00 in the morning just to get something done or watch the end of a train run. That's about as much as I could advise at the moment.

Nathan Labenz: (1:08:07) Obviously, you guys already had with Mighty a lot of hardcore engineering. You talked earlier about co-locating your own servers and managing that on a level that most startups don't find to be an attractive proposition. So you had deep skills and capabilities there already. What have you found needed to change about your team? What were the skill sets you were like, "This is what we have to go out and add to our team for us to be competitive in this space?"

Suhail Doshi: (1:08:38) Yeah. Unfortunately, when we pivoted from Mighty to Playground, we did have to let a whole bunch of people go. And I think almost all of them really understood. I mean, it was kind of like we had a band, and the band was fine for Mighty, but then if we moved everybody to Playground, it was like we had too many drummers in the band. So we were kind of like, "What do we do with that?" So we quickly figured out that we definitely needed to make more space for a bigger AI research team. We're actually still looking for one very senior AI researcher. We only have one slot left to fill. I think it's good to have a good team mix—people that are junior, people that are very senior, people that are mid-level. I think it's good to have that team mix because then your senior AI researchers don't feel like they have to do everything. You might have a valuable thing you want people to do as a company, but not everyone's excited to do it. So it's really good to have a good team mix of people who find different kinds of projects more interesting. That's one thing I do for all teams, not something specific to AI research. The second thing is, I asked Sam for some advice from OpenAI. Back in December, we had a 30-minute conversation. I was just like, "Hey, can you teach me the pitfalls and things of how to run an AI research team? I don't know how to do this. I'm probably going to do it badly. You can help me." I learned a lot of things, but one of the things I was very curious about was to what extent you allow AI researchers to wander. I'm very much "let's ship things as rapidly as possible." Let's clarify scope. Let's figure out what we need to build quality quickly. But AI research is—Sam basically was like, "Yeah, this stuff takes ten times longer than building a web app." And so I was really curious how much you allow people to kind of wander and meander because you might not get good results. I think Greg Brockman, also at OpenAI, has a tweet that I'll paraphrase. It's kind of like, "At first, things don't work at all and they keep not working, and then eventually it's amazing." Something along those lines. He said it better than I'm saying it because he had to write it in a tweet. But I think that's right. I think we're experiencing that. We don't have as many data points as they do, but we tend to let our researchers kind of wander. Probably not as much as OpenAI because OpenAI has a really broad remit, but we care about pixels and creative tools. So there's obviously these guardrails. But for the most part, we're just—I don't know, it's like you don't know. That's the reality of research, and research is hard-won. So we're just looking for wins, looking for treasure everywhere.

Nathan Labenz: (1:11:39) Yeah. Zooming out a bit, Suhail, earlier you were talking about how you were exploring the idea maze. I'm curious—if you were focused as a VC in the space, what would your mental model be for what kinds of companies will endure or should you look to invest in versus which ones to perhaps stay away from that might create some value but won't capture it or be durable?

Suhail Doshi: (1:12:02) Yeah, I've definitely been thinking about that, actually having conversations with other VCs and friends. One thing is, I've gone to a number of AI dinners with people like the founders of Character, Midjourney, and other companies, as well as AI engineers that work at some of those companies. Very helpful to get perspectives all over the place. We would kind of ask these questions of what matters here. There's a general consensus that we're going to, if we haven't already, start to run out of data that we can use from the public internet. Maybe by the end of the year, we'll have largely saturated all of that. I think someone else said this, but there's still a lot of data. It's just privately held. It's probably exponentially growing in a private ecosystem like mobile apps. But a lot of public data is limited, not to mention copyright issues and stuff like that might be a whole other set of issues. So I think the first thing is having some advantage around data is really critical. It might not be an enduring advantage for a decade, but it can surely be a big advantage for many years. I think the second advantage is that there's a space where you could probably make a really amazing product that happens to use AI to build some kind of new consumer network effect, but we're not really seeing that yet. We're still using the old ways with consumer, but there are probably some new tricks, and it requires a lot of experimentation as consumer does. I don't know what that is. But an example of this not going well is AI avatars. It might not be a fad, actually, if it were plugged into something that had a real distribution network and network effect. So there's this question of could we bootstrap something into a network effect with some novel way of using AI to solve some interesting problem for users. I tend to look at that. I would find that really interesting. I think there are some problems with some of the foundation model focused companies. The less generous thing I could say is that they're making APIs in search of a problem. And this approach of doing this might work fine if you are OpenAI and you have billions of dollars and you can pursue 10 experiments at once, 10 $50 million, $100 million experiments at once, and maybe you'll land with a product that has product market fit. Maybe that's ChatGPT. I don't know what those numbers look like. But I think that it could be really tough to do that unless you're so well capitalized like OpenAI. That approach is not maybe replicable, and you might have to do the basics. We're not trying to replicate the OpenAI approach. We are making a product. We're trying to go from a product down to a foundation model, not a foundation model to a product, and we'll see if that works. Yeah. And I think, you know, there's a lot of buzz around the idea that will the large language models be commoditized? I think it's totally valid. For a long time, I had that view that basically the LLMs and everything will converge. They'll all be commodities. Where is the real business in this thing? But the deeper I go in trying to do research, the more I realize how hard it is, how hard it all is. And I am not as confident about that anymore. Maybe over a very long time span, these things will, you know, all technology becomes obsolete over a long time scale, but I don't think that people, I think people that just have a shallow sense of this research are underestimating the complexity of how you build these models and how difficult it is. Sure. Stable Diffusion can get trained for $600,000 on however many GPUs. But if you want to build something that endures and gets better, that requires very concerted investment and resources. You cannot just raise $100 million and say, boom, we've got a foundation model. You have to hire AI researchers. You have to find clever ideas. It's very hard. It's just like software in that way. It's not like you can parallelize everything with money. So, you know, I would just say I'm not so sure that it'll be as commoditized as people think. But I think certainly the prior generation of the model, like GPT-2 or GPT-3, it could be that every two years, the model will get commoditized, but the state of the art model endures for those two years. That could be a world we live in. It's kind of pretty similar to CPUs. And the thing is, what customers want is the state of the art thing. They often don't want the, you know, nobody wants to use GPT-2. You want the best thing you can get because you have to compete with everybody, and your competition's going to use the state of the art. And so then there's this argument around fine tuning. But even that argument, I think, is fairly weak because a lot of the things that you might fine tune today become, you don't even need to, you could use the new state of the art LLM model, and it can do what you previously could do with fine tuning without fine tuning. And so I think people don't fully internalize that the fine tuning thing is also not as defensible. But if I were a VC, you know, the other thing I would consider is if the company has something that they can fine tune on that is private data, that could be really defensible. So there are a couple valid things. But yeah, I think data, real hard AI research, network effects, all those things are largely kind of the same from traditional software still, I think, apply in this world.

Nathan Labenz: (1:18:24) So you've got an 18 month old kid. What do you think their life is going to look like when they are your age? What does 2050 look like in your mind?

Suhail Doshi: (1:18:34) Gosh. Sometimes my wife and I have this conversation of what thing could occur in the future that would make us feel uncomfortable, you know, in our era? And I think the answer to that would be, what if my son wanted to marry an AI or something? And I'm like, I don't know how I would feel about it. I feel deeply uncomfortable, I think, right now. So yeah. I mean, I just think the most weird thing will just be, your my kid will be friends with some AI robot thing, and he won't go outside, and he'll be really obsessed with it. And he'll be like, shouldn't you get real friends? And, you know, it'll be this very, you know, it's like when my parents saw me on the internet all the time. They must have just thought I was probably getting into trouble, which I was, and doing bad things. And, you know, I just think that, you know, it's just, it's crazy. You could just be, you could totally see kids just chatting with some random AI thing. It'd be the most interesting conversation they've ever had. It's unlimited, and it's always interested in you. And it's giving you really cool insights to life. And, I mean, I don't know. I mean, humans don't have unlimited energy. So I don't know. What if my son just, maybe some of his best friends will be AI things, and that'll be just so weird to me.

Nathan Labenz: (1:20:09) He'll be like, you don't get it, dad. Yeah. He'll be like, you don't get it.

Suhail Doshi: (1:20:13) You know this isn't real. Right? And then there'll be something that's more advanced than the Turing test, you know, probably at that point. And he'll reference that. It'll be deeply philosophical. I mean, the other crazy thing could be, that I told some friends a couple weeks ago, that it doesn't seem too far off that we'll have the first AI religion. And the crazy thing is that you can talk to your god, and your god will give you answers. Yeah. So what if my son joined, you know, some new religious AI spaghetti monster religion? But you could talk to the spaghetti monster. And it could be a really positive thing for humanity. It could have good values and principles, but we would all just be like, but you know this isn't real. Right? And it'd be offensive. Anyway, that would be the other weird thing.

Nathan Labenz: (1:21:07) Yeah. I think that's a great place to wrap this podcast and aptly named the Cognitive Revolution. Suhail, thanks so much for joining us today.

Suhail Doshi: (1:21:15) Yeah. Thank you for having me.

Situational Awareness in Government, with UK AISI Chief Scientist Geoffrey Irving

Situational Awareness in Government, with UK AISI Chief Scientist Geoffrey Irving

Universal Medical Intelligence: OpenAI's Plan to Elevate Human Health, with Karan Singhal

The Pixel Revolution with Playground AI's Suhail Doshi

Watch Episode Here

Read Episode Description

Full Transcript

Transcript

Transcript

Nathan Labenz

Read next