GPUMaxing with Dr. Ronen Dar of Run:ai

Watch Episode Here

Video Description

Nathan Labenz sits down with Dr. Ronen Dar, CTO and co-founder of Run:ai, an Israel-based company that helps enterprises train and deploy AI models by optimizing GPU usage. The discussion covers how chip makers can meet the soaring demands, the best practices companies can use to secure compute capacity, and geopolitical dynamics. If you’re looking for an ERP platform, check out our sponsor, NetSuite: http://netsuite.com/cognitive

RECOMMENDATION: The AI Scouting Report
Playlist Parts 1-3:
https://www.youtube.com/watch?v=0hvtiVQ_LqQ&list=PLVfJCYRuaJIXooK_KWju5djdVmEpH81ee

TIMESTAMPS:
(00:00) Episode Preview
(00:51) Introduction to Dr. Ronen Dar
(03:30) Run:ai's technology and what differentiates it from other solutions
(06:00) Today's market compared to when Dr. Ronen started five years ago
(13:40)Run:ai on market competitors like mosaicML
(14:55) Sponsors: NetSuite | Omneky
(22:00) The process and best practices by which companies secure compute capacity
(25:00) Dr. Ronen explains the GPU shortage
(31:50) GPU solutions
(36:00) Relative pricing across major providers
(41:00) What other chip makers are going to be relevant?
(49:00) Global outlook for chip production
(52:45) Worldview around the US-China AI race
(58:00) Can controls on hardware actually control access to AI?

LINKS:
https://www.run.ai/

SOCIAL MEDIA:
@labenz (Nathan)
@ronen_dar
@runailabs (Run:ai)
@cogrev_podcast

SPONSORS: NetSuite | Omneky

-NetSuite provides financial software for all your business needs. More than thirty-six thousand companies have already upgraded to NetSuite, gaining visibility and control over their financials, inventory, HR, eCommerce, and more. If you’re looking for an ERP platform: NetSuite (Code http://netsuite.com/cognitive) and defer payments of a FULL NetSuite implementation for six months.

-Omneky is an omnichannel creative generation platform that lets you launch hundreds of thousands of ad iterations that *actually work* customized across all platforms, with a click of a button. Omneky combines generative AI and real-time advertising data. Mention "Cog Rev" for 10% off.

Full Transcript

Transcript

Dr. Ronen Dar: 0:00 Then we saw how difficult it is to train models on a lot of GPUs doing distributed computing. So our goal, it's still our goal, was to simplify that, to simplify the way data scientists can train big models. Right now, you have this open source model where you can train them on your data and fine tuning, and you don't need tens of thousands of GPUs to do that. Open source LLaMA 2, it's free for research and commercial users, as they say. So that's really exciting.

Nathan Labenz: 0:35 Hello, and welcome to the Cognitive Revolution, where we interview visionary researchers, entrepreneurs, and builders working on the frontier of artificial intelligence. Each week, we'll explore their revolutionary ideas, and together, we'll build a picture of how AI technology will transform work, life, and society in the coming years. I'm Nathan Labenz, joined by my cohost, Eric Tornburg. Hello, and welcome back to the Cognitive Revolution. Today, I'm joined by Dr. Ronen Dar, CTO and cofounder of Run:ai, an Israel based company that helps enterprises train and deploy AI models by optimizing their GPU usage. With companies around the world racing to adopt next generation AI systems, we first discuss Run:ai's technology and what differentiates it from other solutions in the market. Before then, getting Ronen's perspective on today's market for AI chips. From the ongoing shortage, which he expects to go on for a while at least, to the process and best practices by which companies secure their compute capacity, to the relative prices across major providers, the prospects for chip makers other than NVIDIA to meet the soaring demand, the geopolitics of chip production, including the view from Israel on The US China rivalry, and finally, the prospects for so called compute governance to effectively control the pace of AI development, whether in China or anywhere else around the world. As I mentioned on our July 4 episode, 1 goal we have for the second half of the year is to speak to more researchers, builders, and entrepreneurs who are working on AI outside of The United States. And this conversation with Ronen, who combines deep technical expertise with a global strategic outlook, was a great first step in that direction. If you have any other suggestions, please do let us know. You can always email us at tcr@turpentine.co or DM me on Twitter or should I say x dot com where I am at Labenz. Finally for now, I encourage everyone who hasn't already to circle back to my AI scouting report, which is posted exclusively on our YouTube channel. It really is the best way that I know to get up to speed on the fundamentals of AI progress and the most important recent development trends, and it's been gratifying to see that the YouTube audience seems to agree. With that, I hope you enjoyed this enlightening conversation about the physical substrate powering the AIs of the future with Dr. Ronen Dar of Run:ai. Ronen Dar of Run:ai, welcome to the Cognitive Revolution.

Dr. Ronen Dar: 3:08 Hey, Nathan. Good to be here.

Nathan Labenz: 3:10 So I'm excited to talk to you about all things AI workloads, GPUs, orchestrating them, and the future of where we're going as really a global AI user base and market over the next year or 2. For context, you are the CTO and co founder of this company based in Israel, Run:ai. So for starters, tell us about the company and what sort of AI workloads you're orchestrating for folks.

Dr. Ronen Dar: 3:38 So we run AI. As you said, we're based in Tel Aviv, Israel, but we have people all around the world. Have offices in New York as well and California as well. And as you said, I'm now in California. And so I spent a lot of my time here in The US. So we ran AI, and we started in early 2018. We're starting to run since then. So we're an AI infrastructure software company, and we help companies to train and deploy AI models. We're doing a lot of stuff to orchestrating AI workloads and orchestrating GPUs. And I think we're bringing in a lot of technology very close to the GPUs, to the GPU level, and to the cloud native world, Kubernetes world. We have other technologies that use tool optics in optimizing the usage of GPUs and optimizing how workloads are being orchestrated and scheduled in the cloud native environments. And we bring a lot of tools for ML and IT teams to easily deploy, train their models. And we're working with big enterprises right now. Our solution is quite horizontal. So we're working with big enterprises. I'll give you the story with Sony as customers. We're working with the finance industry. We're working with research institutes like MIT. And we see this exciting things that are happening in our space. We've seen it for the last 6, 7 years. And major stuff happening right now. So it's very fun to be in the place that we are right now.

Nathan Labenz: 5:27 Maybe give us a little bit more of the history because it's funny. It's only been 5 years since you've founded the company, but obviously a lot has changed. The types of models that were available to run and the scale of training runs that people were attempting in 2018 is obviously much smaller than what we have today. So what was the market that you anticipated going after that inspired you to get involved in starting a company? And then how has the actual market as it's shaped up deviated from your expectations?

Dr. Ronen Dar: 6:04 Yeah, that's a great question. So in fact, we started in 2018 and when we started, it was all about computer vision applications. I think the big breakthrough in the industry for machine learning and deep learning happened around 2012, 2013 with the ImageNet competition and researchers of Toronto University training AlexNet and doing this big breakthrough in how machine learning models, deep learning models can get insights from images and videos. And back then, AlexNet was trained on 2 GPUs. That was the big breakthrough back then in 2012. So 2 GPUs to train the state of the art models from scratch. And then we saw an interesting trend of people training bigger models with more parameters using more data and using more GPUs. And so a few years later, 2015, ResNet came out and ResNet was already trained on hundreds of GPUs. And so we saw this trend of computer vision models being scaled from a few GPUs to train the state of the art models to hundreds of GPUs. 100x increase in the requirements to train state of the art models back then. So we started, I mean, back then we saw how difficult it is to train models on a lot of GPUs doing distributed computing. So our goal, it's still our goal, was to simplify that, to simplify the way data scientists can train big models and just to allow them to train huge model very easily, just 1 click their own distributed computer. So we started with that, and then we saw that there are huge problems on distributed training, but also how GPUs are being utilized, how GPUs are being orchestrated, and how ML ops are getting access to GPUs. So we saw a lot of inefficiency there, a lot of complexities in just getting access to GPUs. And we solve those problems as well. As I said before, we built a lot of technology around that aspect of just getting access to GPUs and utilizing them efficiently. And what I think we saw in 2017, 2018 with the dual breakthrough in natural language processing. So we saw again the same trend that we saw in computer vision. People trained state of the art model, NLP models, language models in 2018 on hundreds of GPUs. I think GPT-1 was trained on hundreds of GPUs. Then GPT-4, just according to rumors, now GPT-4 is closed source. So according to rumors, GPT-4 was trained on more than 20,000 GPUs. So in a 4, 5 years time frame, we went from training state of the art models on hundreds of GPUs to tens of thousands of GPUs. So another 100x increase in the computing requirements for training state of the art models. Models become bigger, more data, and more compute. So we see this trend once again, and we see now a lot of companies are seeing what can be done with Generative AI and language models, and they are actually trying to bring those capabilities into their organizations. For sure, it's going to transform industries, and people see that, and I think people need to take action and bring AI into their organizations, how their business, how their industry is going to adopt AI and work on that. So we are helping our customers to do that. Now it's about large language models, about how to train them. Other people are still training their computer vision models and models that are much smaller than those language models. But we're happy with those models as well.

Nathan Labenz: 10:28 So where do you play in the overall stack or architecture of all this stuff? And I mean that maybe in a couple different ways. Conceptually, there is obviously a significant divide between training and inference. And then also there's vertical integration. You partner with NVIDIA. I wasn't quite able to figure out, are your customers primarily managing their own physical computing resources and you're a software layer that complements that? Or do you have deployments across all the major public clouds? Describe how far the tentacles reach.

Dr. Ronen Dar: 11:09 So we can run our platform wherever the customers want their GPUs to run. So we have customers who run their platform on premises, and we support having that solution. And we have a lot of customers in the cloud managing small clusters of GPUs with tens of GPUs. And we have customers running thousands of GPUs in just 1 cluster, managing all the way on AWS. We have also customers in Azure and Google. So we're a so called hybrid solution. So with Run:ai, you can get just 1 platform with which you can run workloads on premises, in the cloud, it's all the same interface, same tools. So we're in the stack, we're running with Kubernetes, with Cloud Native, so we can run on any Kubernetes distribution. It can be on prem Kubernetes. We can run any managed solution of the cloud providers. And then in users, analysts can use our tools to train their models or deploy their models. It's both training and inference. And we're really taking an integral approach. So we can integrate with any tool that runs on top of Kubernetes. So we work with a lot of tools in the ML ecosystem, experiment tracking tools, other tools for orchestration, workflow orchestration tools. We're open, I think it's really important because the field moves so fast and new tools are being created every day. So that was 1 of our goals. That was 1 of our ways to operate. We wanna be as open as possible.

Nathan Labenz: 13:18 A lot of that sounds. We just had another episode with a couple of guests from Mosaic ML. And I've made a couple of jokes about the Cognitive Revolution bump that obviously led them to a good outcome very shortly thereafter. I don't want to get overly bogged down in comparing 1 company against another, but how would you compare and contrast the business that you've built versus how you see Mosaic? Is it just the sort of thing where the industry is growing so fast that you guys don't even worry about competition? Or do you see that there are head to heads and you have meaningfully different positioning in the market?

Dr. Ronen Dar: 14:00 Okay. So Mosaic, first of all, they started after us. So they started, I think, like 2 and a half years ago, and we started 5 years ago. Mosaic is really have clear focus on generative AI and large language models. We're more general than that. But I think that the main thing is that we came and we built our platform very much from the bottom up. So we came a lot from the GPU itself, from the hardware itself, and we saw all the issues and problems with utilizing GPUs. So we really when we have built our platform, we really understand very well the software stack, the trunk of GPUs, and we saw the limitations of those software libraries, software frameworks.

Nathan Labenz: 14:52 Hey. We'll continue our interview in Nathan Labenz: 14:52 Hey. We'll continue our interview in

Dr. Ronen Dar: 14:54 a moment after a word from our sponsors. And we built 2 main components, I think. So 1 of them converge in our ITT also, right in technology. So we built that. One thing that we built is GPU optimization here. So we made what we call an API related optimization, CUDA level visualization. So we sit at the CUDA level, and we intercept CUDA calls, and we control access to GPUs. And we do that for enabling better access to GPUs. So with that layer, workloads can share a single GPU. So we have this feature called fractured GPU fractionalization, so we can fractionalize 1 GPU and can be shared. And we know how to swap memory between CPU and GPU, so people can provision GPUs to run video models or more models on the same GPU. So essentially, we bring a lot of capabilities into utilizing GPUs much better. The second component is more of a schedule and orchestration. So we saw that Kubernetes is the state of the art infrastructure and management framework today in cloud. But Kubernetes was built for running microservices on commodity CPU, whereas AI workloads are totally different. AI workloads are really intensive on GPUs. There's a lot of experimentation in terms of how data scientists, machine engineers, make their workloads. And we saw that there are a lot of scheduling capabilities missing from Kubernetes that are actually available in other things, other things like high performance computing, HPC, or scheduling capabilities on the Hadoop ecosystem. Yarn Scheduler is an amazing scheduler that was built, I think, 10 years ago, but it brought a lot of scheduling capabilities that were really needed for running Spark workloads. So we brought deep concepts from those work from HPC and Yarn, and we brought them into the Kubernetes world. So we're bringing in our batch scheduling capabilities, ability to preempt workers, queue workloads, and do gang scheduling. So a lot of advanced scheduling. And at the end, what it allows teams to do, it allows them to, first of all, get a guaranteed quota, a way to get access to their GPUs, but they are not limited to that. Beyond their quota, they can always go around the quota and use more GPUs and run more workloads if there are available GPUs in their clusters. If there are idle GPUs, we allow other people in the organization to use those GPUs. So essentially, those scheduling capabilities are bringing again the other efficiencies into GPU clusters and much more availability of GPUs. So suddenly, ML teams become much easier for them to get access to more GPUs very easily on demand. So we're increasing the efficiency of GPUs that are being utilized, and we're increasing the availability of GPUs. And availability of GPUs is a serious, serious problem. And so I think that is really unique for us. The fact that we bring a lot of efficiency and a lot of and increasing the availability of GPUs. I don't think that anyone else in the world is doing that. So bringing that, we are also doing a lot of simplicity. So we're helping ML teams to easily train the model and easily deploy models. So we're investing heavily, we invested heavily, still investing in providing tools and ways to abstract the complexity of just running workloads and just training models and deploying them. I think without the right infrastructure solution, without the right platform, they will spend 90% of their time on infrastructure, on setting up libraries, setting up products, setting up a lot of stuff. And for them to scale or scale out their experiment is really difficult. They are running just 1 experiment, and they want to run it instead of 1 on 1 GPU on a lot of GPU, tens of GPUs, that's particularly difficult. Or if they want to run a lot of experiments in parallel, so scaling out those experiments, that's also really difficult. So we bring a lot of tools to allow researchers and AI practitioners to iterate, scale up and experiment very easily. So for us, I think it's about simplicity and about bringing efficiency. And then the efficiency part, I think no 1 else provides the value that we bring. We have customers that have 4x and even more than that, the utilization of their GPU utilization, increasing their GPU availability. And so together with the simplicity, I think we're bringing a really unique offering to enterprises in the world.

Nathan Labenz: 19:56 Yeah. Okay. Cool. That's very helpful. And it is certainly in the context of the shortage that... And I want to turn in a second toward just the macro context that's driving all of the concern and focus on utilization. These things are not cheap by default and they're getting bid up at the current moment. So that puts you in a position to be even more valuable. Am I understanding correctly that this sort of assumes a dedicated capacity? Because when you say it could be more efficient, that could mean if you're buying on demand access that you buy less. But as you talked about it, it sounded more like there's an assumption of certain either physical infrastructure that a client has or at least some sort of contractual commitment that now they have a certain amount of capacity, let's make the most of it and not have our data scientists waiting around when they could be getting to work.

Dr. Ronen Dar: 20:57 So that's another good question. For what we see, big enterprises, big organizations, they usually secure access to GPUs. So just getting access to GPUs, to a lot of GPUs, that could be really difficult. If you go to your AWS account manager and you ask them to increase the limit of your GPUs, GPU instances, so you can try and do that. And if you try to increase your limits from 10 GPUs to 1,000 GPUs, for example, that's a big deal. You get a lot of questions while trying to do that. So I think just securing access to GPUs and having them available for you, that's a big part. So it makes enterprises right now buy and reserve those GPUs. So we see a lot of them using reserve instances of GPUs, huge clusters of GPUs. That's for enterprises. That's what we see. For sure, reserve is something that is being done when it comes to GPUs. Now when it comes to smaller companies, for them, on demand access to GPUs is totally relevant. Many startups won't want to do a big investment upfront in large GPUs. So many of them are using on demand. Or maybe it's a combination. But then in that case, we have also a lot of startup companies work with us and are our customers. In that aspect, what we get is clusters that are elastic. So those clusters, Kubernetes, maybe they are multiple clusters in different regions, and just deciding on where to run the workload on each region and how to find the best GPUs in the right regions around the world, that could be a challenge. So we're helping also customers just to get access to clusters that might be spread in different regions, and they might be a dynamic cluster where you can scale up or down. So for sure, it's about reserved and about on demand instances as well.

Nathan Labenz: 23:10 Yeah. So let's talk a little bit more about access and the nature of the market. And I don't have a lot of experience in this for context, and I imagine most of our listeners don't. Unless you're 1 of a small pocket of people at a company that's going to make this kind of capital investment or commitment, you're probably like me and you are mostly paying the hourly rate for GPUs as you go. That's worked fine for me. I haven't done anything at huge, huge scale. But obviously the biggest companies in the world are waking up to the fact that there is this shortage. Maybe for starters, can you just tell me how do you understand the shortage? How bad is the shortage? Are we now in a period of eternal GPU shortage until further notice? And what do people actually go through to buy? In googling this, it's not super, or using even perplexity to search for information about this. It's not super obvious. You can go get 1 here on eBay, and then there's an H100 on Amazon in a random spot. But it's not really obvious how the big buyers interact with NVIDIA in the first place to even get bulk orders. So maybe you could demystify some of that for us a little bit.

Dr. Ronen Dar: 24:27 First of all, I think the GPU shortage is relevant to whoever buys GPUs for their on premises environment. Or for companies or people trying to get access to the high end, the most advanced, the newest GPUs in the cloud. So also getting access to those newest GPUs, that can be a challenge if you need a lot of GPUs. I think what we have seen in the last 6 months is that the demand for GPUs increased amazingly fast. It grew in a way that NVIDIA didn't foresee. OpenAI came out with ChatGPT, and they showed the entire world what can be done with large language models and generative AI. And then other companies, Microsoft and all the cloud providers, and Elon Musk with his new company, he went and bought 10,000 GPUs just few months after ChatGPT went out. So a lot of companies and all the cloud providers went to buy more GPUs from NVIDIA. So that created a big jump in demand for GPUs. With hardware, compared to software, it's much more difficult and much more complex to supply the demand when you have unexpected changes in the demand. So that's what happened. And for NVIDIA, they need to change the way and the pace of which they manufacture those GPUs. So we're speaking about that. We have machines that are actually manufacturing those GPUs. To manufacture more GPUs, you need more machines, and that takes time to build, to set up those machines and let them operate. So supplying that demand and bridging that gap between the demand and the supply when it comes to hardware, that takes some time. And I'm sure NVIDIA are working on it. And I'm sure they are increasing the pace in which they manufacture their GPUs. I think the GPU shortage would go away with time. I don't know how much time it would take for the shortage to go away, but a lot of companies are still now waiting for their GPUs, the GPUs that they ordered. The H100, so H100 is the newest GPU people can buy. And a lot of those are waiting in line. So that's the GPU shortage. I think it's really interesting because of what we see, we spoke about the trend of using more and more compute to train large models. I think for the first time, we're seeing also a huge increase in the demand for GPU power for inference, for running those models in production. So these large generative AI models are so huge that usually they don't fit into the memory of 1 GPU. So if you want to run state of the art models with hundreds of billions of parameters, typically you need a lot of GPUs, 4 GPUs, 8 GPUs, maybe more even more than that. But I think now we're seeing a significant increase in the compute requirements for inference workloads. And that's significant. That's significant for every company that is actually providing and running models like OpenAI and like GitHub with their co-pilots. I'm sure that they operate at scale. So Microsoft between OpenAI and GitHub with their co-pilot, they need to manage a lot of GPUs. For inference, for sure, this contributed to the demand for GPUs very significantly. And I think what we've seen is that the pace in which AI application and AI innovation is happening is much faster than the pace in which GPU or the hardware is being produced. It relates to Moore's law, or it relates to other issues as well. But AI, the AI space, AI innovation moves at exceptionally fast pace, much faster than the capabilities of the hardware. So when that happens, you're getting more and more demand for hardware. And then you're getting these problems of demand goes too fast, the supply doesn't manage to catch up. And so I think that... My prediction is that this is not the last time that we are seeing GPU shortage. I don't know when will happen the next GPU shortage. But I think it will happen again, GPU shortage. It has major consequences. And I think it has major consequences on the industry, on how companies are operating, on the cost of AI. So a lot of interesting aspects related to that.

Nathan Labenz: 29:56 So how does it actually work today if you want to go buy, start your own little cluster? If you are, let's say you're... I'm not sure what a relevant threshold would be. Obviously, as you said, leading clusters now getting into the tens of thousands. Inflection AI just made this headline with a huge raise, I believe, with NVIDIA as a backer. And a lot of that's going get plowed right back into the chips. I mean, at the highest level, there's clearly some strategic deal making going on where NVIDIA is kind of investing in the people that it's going to be then willing to ship the most chips to. But how does it go if I just want 100 H100s or 1,000? Do I contract directly with NVIDIA? Is there a secondary market that tends to... Are there futures contracts that people sort of trade in and then ultimately because pricing seems fairly dynamic? I'm seeing headlines that are like, H100s now cost $40,000. Presumably that's not NVIDIA changing the... I couldn't even find a price on their website. So how does that actually work in practice to go through this buying process?

Nathan Labenz: 29:56 So how does it actually work today if you want to go buy, start your own little cluster? If you are, let's say you're in, I'm not sure what a relevant threshold would be. Obviously, as you said, leading clusters now getting into the tens of thousands. Inflection AI just made this headline with a huge raise, I believe, with NVIDIA as a backer. And a lot of that's going to get plowed right back into the chips. I mean, at the highest level, there's clearly some strategic deal making going on where NVIDIA is investing in the people that it's going to be then willing to ship the most chips to. But how does it go if I just want 100 H100s or 1,000? Do I contract directly with NVIDIA? Is there a secondary market that tends to kind of, are there futures contracts that people sort of trade in? And then ultimately, because pricing seems fairly dynamic, right? I'm seeing headlines that are H100s now cost $40,000. Presumably that's not NVIDIA changing the, I couldn't even find a price on their website. So how does that actually work in practice to go through this buying process?

Dr. Ronen Dar: 31:13 There are a few decisions that one needs to make. I think one of the first decisions is around where to invest. Is it to invest in your own private environment and just buy GPUs and then operate those GPUs by yourself, or whether to go with a cloud solution? And what I'm saying buying GPUs and operating an on premise environment, not necessarily an on premise environment, can be also a colocation where someone is managing your GPUs in some colocation environment. But still, it's not managed by a cloud provider, it's managed by a colocation provider. So colocation or services, that's working. Cloud is another thing. There are some trade offs in cost and there are these huge, interesting trade offs there. But that's the first decision. So if you go and you choose going for cloud, then you need to choose the cloud provider. You can actually choose multiple cloud providers. It's not necessarily AWS, Google, or Azure. Right now, we have amazing smaller cloud providers, smaller cloud providers like Lambda Labs and CoreWeave. We're working with both of them, and both of them have amazing GPUs and they provide access to GPUs. So you can go with that as well. So that's one thing. On premises, that's a totally different story. Then you need to decide on from who to buy the GPUs, whether it's going to be from an NVIDIA representative or it can be from one of the system integrators. It's another story. And then it's about choosing the right GPU type. Whether you want to go to the highest end, the newest GPUs, H100, or you're saying, okay, I'm good enough with secure access to A100 or V100. That would be good enough for me, and the cost will be reasonable. So I think there are some trade offs around that. There are some trade offs not just relating to GPU. There are trade offs relating to storage. Where are you storing all your data? How performant is it? And questions around networking. If you're running small scale experiments, maybe networking is not that important for you. Or even analytics, those interconnects that are connecting the GPUs. So maybe those are not that important for you. But if you do train models on multiple GPUs or multiple machines, then networking becomes really crucial for your performance. Then you need to make some decisions there as well. So it's those decisions combined, cost, performance, and needs of the users, I guess. So we're seeing customers building huge on premises environments and still securing access to GPUs in the cloud. And we see customers doing just cloud, secure access to GPUs in different clouds. But the world right now, I think everyone is seeing a hybrid cloud. It's there. It's actually the best option because you don't want to just lock yourself to one provider. So we're seeing enterprises choosing more than two providers on premise solution.

Nathan Labenz: 34:46 Prices today. An A100, what I saw on Amazon, it was $7,000. And H100, I'm seeing $40,000 in headlines, but I'm feeling that is probably more a reseller price, not what people are mostly paying. Could you give us general guidance on what these things actually cost in the market today?

Dr. Ronen Dar: 35:10 If it's cloud prices, then they are automated there. Cloud providers are publishing their prices. So an A100 per machine, it costs between $30 to $40 an hour. So that's really expensive. H100 will be more expensive. They are still not available on AWS. I think they are available on Azure, for example. H100 are available on cloud providers. CoreWeave, they are already offering H100. So you need to pay a lot for the newest GPUs. They will be more performant. So it's important to understand that it's not just about the absolute cost of the GPUs, it's also about the performance. Because if a GPU costs 2 times more, but your workloads are actually 4 times faster on those GPUs, then it's actually the best solution for you, both cheaper and faster. So H100 is according to NVIDIA benchmarks much better, much more performant compared to A100 and previous GPUs. Usually, GPUs become more and more performant. The actual numbers around the performance increase are really dependent on the workloads themselves, but they also come up with higher costs. Now when it comes to on premise, if you go to NVIDIA and buy GPUs, state of the art DGX machines, then you can pay hundreds of thousands of dollars for one machine. It would be a very high end, better performing GPU machine, but you pay hundreds of thousands of dollars.

Nathan Labenz: 37:04 One quick follow up on the on demand pricing. Because when I was researching for this, I found Lambda Labs seemed to have the lowest A100 price that I was able to find online, which was just over a dollar. I think it's a dollar 10 an hour. And they quote AWS prices at being $4 an hour. Were you maybe citing something different there or is that accurate in your understanding?

Dr. Ronen Dar: 37:29 Okay. So I said between $30 to $40 an hour for a whole GPU machine with 8 A100 GPUs, 8 packs. So that's on AWS. So that's one of the advantages of Lambda Labs and CoreWeave significantly lower price. That's one of the advantages for sure.

Nathan Labenz: 37:53 How is that sustainable? I mean, I understand there being a big delta between making an upfront investment in on premise physical hardware, even if you're co locating it, whatever, versus the flexibility of the cloud. But it's surprising to me that there would be a 4x difference, especially in the presence of companies like yours that are multi cloud and help you optimize and shift around. How do you understand that 4x difference in on demand pricing?

Dr. Ronen Dar: 38:24 That's an amazing question. I don't know. I really don't know. Actually, I agree. It's a lot. It's a big difference. You can just ask CoreWeave. But I think it's an interesting question to ask some of the cloud providers.

Nathan Labenz: 38:42 Their H100 pricing is $2 an hour, basically. So they're selling the H100, renting the H100 at still half the AWS A100 price, which is definitely pretty confusing. Okay. Well, yeah, I'll continue to try to figure that one out. So let's maybe move to, and this I don't know if this will be in scope or out of scope for your business. Because if I understand the technology as you've described it, you're at the CUDA layer, which is the NVIDIA proprietary software layer, which I would say many people, from what I understand, and I haven't explored other alternatives, but from what I understand, that is often said to be one of the huge value drivers is that the software actually works. And then you're injecting another layer into that, strategic advantage that NVIDIA has and even adding more functionality. But I was also curious, what other chips are relevant today or what other chip efforts are relevant today? Obviously, NVIDIA's got huge demand and the stock price reflects that and people are waiting for their things. Do you guys see yourselves trying to partner with other chip makers? Whether you do or don't, I'd love to get your take on what other chip makers are going to be relevant over the next few years as AI just continues to presumably scale and scale and scale.

Dr. Ronen Dar: 40:14 As you said, we're sitting in different layers of the software stack. So we're sitting in the CUDA layer and we're sitting at the Kubernetes layer, sitting on top of Kubernetes. We know that containers are running. So we're sitting on different layers, and we're working very closely with our hardware layers, and we're working very closely with Kubernetes providers, and cloud providers. And I think the market for AI chips is really interesting. There are a few players, as I see. NVIDIA, of course. People say that NVIDIA has 87% of the market. That's a big number. The players in the market, as I see, first of all, are cloud providers. So AWS and Google, they own chips, and most of them are building and working on those chips in Israel. So Israeli engineers are amazing. They are really good with hardware, and they are really good with software as well. And so, Amazon, Google are building their AI chips here in Israel and Intel as well. NVIDIA also has a presence in Israel already, and they bought Mellanox for their networking. Mellanox is a major Israeli company.

Nathan Labenz: 41:48 That's all fabless design, right? There's no actual, are there fabs going in as well? Or I'm understanding this to be the design layer.

Dr. Ronen Dar: 41:57 No, that's fabless. That's fabless. That's really interesting. Now Intel is Intel becomes so many details now. Intel always had their own fabs for their own usage, and now they're going to the region where they offer their fabs to others. But that's another interesting topic. But in terms of the AI chips, all of those players are really, they are established. AWS, they have already chips for training and chips for inference offered in their cloud. Google is already for several years, I think more than 5 years already, their TPUs are also in their cloud. There are players like AMD and Intel. So AMD are really a strong competitor because their GPUs are used in a significant way in the gaming industry. So they are strong there. And their GPUs are really good. Their technology is really good. So they are a strong competitor. We're starting to see more and more AMD usage. We have customers running workloads on AMD. And Intel is also investing. Intel also invests in Azure, building those chips. And there are startups, a few of them, that are also building their own chips online, and that's interesting. And as you said, I think NVIDIA right now is controlling the market. It controls the market because of their software stack, a lot because of the software ecosystem. So it starts with the CUDA platform, but it goes much beyond that. So NVIDIA is investing a lot of efforts in just getting out a lot of software libraries, software frameworks, software tools in the AI ecosystem just to enable AI, to enable more and more workloads to run on GPUs. But as you said, it's really easy to run workloads today on GPUs, on NVIDIA GPUs. Sometimes it's more difficult to run them on other chips. But I think with time, it will become more and more easier. So that's the ecosystem. And I think also NVIDIA, really, NVIDIA is a great company. The GPUs and the technology that they are building is really advanced, and they are building what the market will need in 2 years or 3 years, so they already offer it right now. So they came out recently with the H100 software offering. So that's a new offering. And they carry out really interesting technology there. They are offering really big GPUs with a lot of memory. They are already seeing what the market needs. And I think right now the market needs to run huge models with a lot of memory. Memory becomes such a bottleneck. And they have this offering already out and we support those big models. So I think NVIDIA is a great company. But as time will go, we'll see a second player. It will be really interesting to see who will be that second player.

Nathan Labenz: 44:56 So it sounds like AMD, you probably put in the second position right now in terms of at least being a proper rival to NVIDIA. We've got the Amazon, the Google TPUs. Microsoft has recently announced that they're designing their own chip too. I don't know if I said Meta yet, but they've had their own chip and have one of the biggest clusters in the world as well. What about other companies like Samsung didn't come up there, Tesla, do you see those two as potentially big timers?

Dr. Ronen Dar: 45:31 Maybe, maybe. Tesla, you forgot to mention Qualcomm as well. So Qualcomm is also a player that you don't want to underestimate. And then let's see, Qualcomm and Samsung, they have their own offering. They might have an interesting offering when it comes to inference and running inference and running models at the edge. That's a market that will continue to grow, and there are opportunities for sure for players that are traditionally good in those markets like Qualcomm and Samsung in the edge devices. But the AI market is really different than previous markets. First of all, because software is really important. It's not just hardware. And these days, you buy hardware and software. You must have that software stack on top of it. It's a critical enabler. And also the innovation in the space moves so fast. So the hardware manufacturers need to move fast as well. They need to provide more and more innovation in a faster pace. So in this AI space, I think it's very different compared to previous markets that we saw.

Nathan Labenz: 46:48 Are there any other smaller or specialist companies that you would suggest to keep an eye on? One that comes to mind for me because we had CEO Andrew Feldman on as a guest was Cerebras Systems, and they've made, as I'm sure you're aware, the biggest ever chip, very different approach, obviously. Do you see them or other smaller, what had been more niche chip companies being able to grow in a significant way?

Dr. Ronen Dar: 47:17 Yes. Cerebras, they have an amazing offering actually, very unique. I haven't seen anything like this chip. I haven't tried it, but it's really unique and it's really differentiated. And let's see, let's see. They might have a real opportunity. I think that the AI chip market is going to be huge and even small players can get huge revenues. So in this market, it's going to be huge with no doubt. And so there are opportunities there with no doubt.

Nathan Labenz: 47:49 How about just the rest of the world? What do you think is the state of or the outlook maybe for Chinese companies, any European companies? I mean, I'd want to get your take at a higher level as well on just the geopolitics of all of this. Everybody's waking up to the fact that this is somewhat of a strategic resource. I mean, it's maybe not quite oil, but it's increasingly, I think, thought of as the next oil. Do you think countries like China or trading blocks like Europe can develop their own champions that can enter this top tier over the next few years, or are they just too far behind and it's just still impossible? Nathan Labenz: 47:49 How about just the rest of the world? What do you think is the state of or the outlook maybe for Chinese companies, any European companies? I want to get your take at a higher level as well on just the geopolitics of all of this. Everybody's kind of waking up to the fact that this is somewhat of a strategic resource. It's maybe not quite oil, but it's increasingly, I think, thought of as the next oil. Do you think countries like China or trading blocks like Europe can develop their own champions that can enter this top tier over the next few years, or are they just too far behind and it's just still impossible?

Dr. Ronen Dar: 48:38 Listen, this is a great question. And a couple of years ago, people spoke the opposite. Right? People spoke about all the innovation coming out from China. People spoke about all the research and all the academic papers coming out from China, and people said, what's going on with the US? The US is behind. That was the story 2, 3 years ago. The US is behind China. And see what happened. I think what happened in the last 6 months or 1 year, that was amazing. It really showed the strength of the Silicon Valley, all the innovation and the technology coming from California, and with the OpenAI coming out with ChatGPT. I think the US really showed that trend in the last 6 to 1 year. We showed, you know, cheese moved so fast. I think China now is behind. The geopolitically issues, right? That's a really interesting topic of different angles on where the chief angle, with this chip happening. Chips are strategic, not just to companies but to countries. And we're seeing this chip war with China, really interesting stuff around AI. I think there is a race right now. There is a race in the Western world, but there is a race which is happening as well. And it's not just stopping. Things are moving fast so rapidly. Right? You could think, for example, on OpenAI with the ChatGPT, I think a few months ago, I heard that everyone spoke about how good ChatGPT is compared to others. That it's much, much better than anything else in the market. And I think now I'm starting to hear that other model providers provide very good models, maybe something even better than than ChatGPT. Things are moving so fast. Technology gaps are being closed very fast. Right now, my analysis is that China is behind the US, but I don't know how fast that gap would be closed. I guess it will be closed. Things are moving so fast. So it is really difficult to predict, I think, right now in the AI space, what would happen in the year, 6 months, 1 year.

Nathan Labenz: 51:12 Yeah, unfortunately, I totally agree. I always say my crystal ball gets real foggy after about 6 months. I'm with you on that for sure. Obviously, I have a US based perspective. I guess I'd love to hear, you spend time here and also in Israel. I'm interested in, is there a difference in worldview around AI dynamics? Here, I think we honestly have a sort of, in my view, counterproductive framing of AI as this new front in the contest or rivalry between the US and China. And I'm not really comfortable with the idea that we are creating these sort of seemingly pretty severe escalations along the lines of trying to cut China off from leading edge chips. To me, that seems like it really increases the risk of other kinds of conflict because, for example, if they can't buy any of the chips from Taiwan, then maybe that makes the cost of disrupting the production in Taiwan much easier for them to bear. Maybe I'm a simpleton on that, but how would you describe, watching from maybe not a neutral country, but certainly a country that's far from both of the 2 would-be hegemonic rivals. How does the rest of the world see this US China dynamic?

Dr. Ronen Dar: 52:45 I think, from my perspective, the geopolitical issues that are really escalating in the last 10 years prior to things are biggest. It's China and the US. Now Israel is a very small country, and we have a lot of political issues and a lot of challenges with our neighbors as well. Right? So we also really always in a need to protect ourselves against other countries that are threatened with us, we might say. So I think that for us, Israel, we used to be in this state of mind that there is danger around the countries that are aiding against us. So, yeah, it's a big question. It's a huge question. I think I don't like to see things escalating geopolitically. It seems it doesn't go in that direction. Right? But I'm optimistic in my nature, so I always hope the season will become better.

Nathan Labenz: 53:59 It's obviously impossible to characterize Israeli opinion briefly. I know there's a lot of disagreement and contention on everything, but you could either speak for yourself or try to characterize it however you would. But do you see, do Israelis see, the view from this kind of middle part of the world? Does China look like an adversary from there? Obviously, there's a lot of local dynamics with neighboring countries, which are much more historically rooted, but does China look like an adversary from Israel?

Dr. Ronen Dar: 54:31 From our perspective, Iran is the adversary. Iran is the number 1 adversary for Israel. And from our perspective, Iran is playing with China on down the other side from our perspective. I think that's one of the strengths of Israel and partnership with Israel and the US. From our perspective, Iran is the number 1, I could say. So, and from that perspective, I think the Israeli people really see eye to eye with the American people. They don't like to see things escalating, but they also found out that they see the danger. And so I think for several decades already, collaboration between Israel and the US around those topics. But as I said, I'm optimistic. I am always hoping for the best. AI, for sure, going back to AI, AI for sure had an impact on conflict. Because now it's a race. It's a race. Countries are being armed with AI technology, and AI technology is going to change their defense space. There's no doubt it's going to be changed for defense space. So for AI, chips are really important. So chips are strategic, AI is strategic for countries, and you will see it, I think the political state changes in the next decade. It's interesting because of AI.

Nathan Labenz: 56:04 Do you see that this sort of supply control is likely to be viable for controlling who can do what with AI? That could be in the context of China. Right? We could say there's these export controls and they can only buy H800s instead of H100s. And then I wonder, is that really gonna work? Or is there gonna be some way to get around that either by post manufacturing even or maybe just very clever software work that could circumvent some of these imposed restrictions? And then I also think about that in the context of the just AI safety dialogue or discourse in general. People are very hopeful. And I'm not sure how realistic these hopes are that maybe we could have a sort of know your customer regime for GPUs and you can only buy so many before you gotta have a license or a permit to do whatever. Do you feel like this is a sort of controllable enough resource that it's ultimately gonna be fruitful to try to control the development of AI through controls on hardware?

Dr. Ronen Dar: 57:18 That's a great point, right? People are speaking about regulating AI. So coming from the compute angle, that can be an interesting angle, just controlling access to your compute. So that's what we've right now with the chip war between the US and China. Right? They are just controlling the access to the newest GPUs. So China is not allowed to get access to the newest GPUs or up to the US regulation. Right? So now people are speaking about AI safety and about regulating the future of AI and regulating how AI is being used. And so can that be controlled with compute? I don't know. Because right now to train, if you go, I'll become a little bit technical, but if you're training huge models and you're trying to get to state of the art models, then you need a lot of compute. Right now, you have this open source models. You can train them on your data and fine tuning. You don't need the kind of power of GPUs to do that. Right? Just yesterday, Meta, which I love what Meta is doing right now, open source Llama 2. Right? Open source Llama 2, it's free for research and commercial users. That's really, really exciting. And now if you have those very capable models, open source, and people can take their own data, and fine tune those models using that data. They don't need a lot of compute then. Right? So then controlling the access to compute becomes less relevant, I guess. So it's an interesting aspect. Regulating AI, I don't think it's an easy question. It's really complex.

Nathan Labenz: 59:17 That's probably a good note for us to leave it on. I know that regardless of how all this develops and there is tremendous uncertainty, you and the team at Run:ai will be helping people get the absolute most out of their investment in GPUs. So Dr. Ronen Dar, thank you for being part of the Cognitive Revolution.

Dr. Ronen Dar: 59:38 Thank you, Nathan. That was great fun. Thanks.

Bioinfohazards: Jassi Pannu on Controlling Dangerous Data from which AI Models Learn

Bioinfohazards: Jassi Pannu on Controlling Dangerous Data from which AI Models Learn

Try this at Home: Jesse Genet on OpenClaw Agents for Homeschool & How to Live Your Best AI Life

GPUMaxing with Dr. Ronen Dar of Run:ai

Watch Episode Here

Video Description

Full Transcript

Transcript

Nathan Labenz

Read next