The AI Revolution in Medicine with Dr. Isaac Kohane of Harvard Medical School
Nathan interviews Prof. Zak Kohane on AI's transformative potential in medicine, highlighting his early GPT-4 access and insights from his new book.
Watch Episode Here
Video Description
Nathan sits down with Professor Zak Kohane, the Chair of the Department of Biomedical Informatics at Harvard Medical School, and co-author of the new book, "The AI Revolution in Medicine", for which OpenAI's CEO Sam Altman wrote the foreword to the book.
Professor Kohane was among a select few people to receive early preview and research access to GPT-4 in the fall of 2022, and his approach to exploring and characterizing how AI is about to transform medicine.
The book is out May 13, 2023 and can be ordered here: https://www.amazon.com/AI-Revolution-Medicine-GPT-4-Beyond/dp/0138200130/ref=sr_1_1?crid=11PK3RBYATA5M&keywords=carey+goldberg&qid=1682346404&sprefix=carey+goldberg%2Caps%2C116&sr=8-1
Regardless of the field you're in, Dr. Kohane's combination of deep immersion, enthusiastic exploration, pragmatic optimism, risk awareness, realism, and forward-thinking vision make this a worthy example for others to study and emulate.
TIMESTAMPS:
(00:00) Episode Preview
(05:29) Dr. Isaac Kohane’s story
(15:00) Sponsor: Omneky
(16:29) Advice to others thinking of applying AI to their own disciplines
(20:33) The tension of using AI in medicine
(25:04) The Trial Paradigm of AI
(31:15) Is it possible to use GPT as a healthcare provider in population studies?
(34:14) The Trainee Paradigm of AI
(36:09) The Partner Paradigm of AI and how doctors should use AI
(40:33) AI provided interaction and how that can improve patient care
(42:26) The Torchbearer Paradigm of AI
(42:59) Can GPT independently conduct medical research?
(46:58) The future for impactful use cases of AI in medicine
(51:04) Integrating AlphaFold into language models
(54:10) AI-alignment and patient data in medicine
PODCAST RECOMMENDATION: Upstream with Erik Torenberg
Youtube: @UpstreamwithErikTorenberg
Audio: https://link.chtbl.com/Upstream
TWITTER:
@CogRev_Podcast
@labenz (Nathan)
@zakkohane (Isaac)
@eriktorenberg (Erik)
Thank you Omneky for sponsoring The Cognitive Revolution. Omneky is an omnichannel creative generation platform that lets you launch hundreds of thousands of ad iterations that actually work, customized across all platforms, with a click of a button. Omneky combines generative AI and real-time advertising data. Mention "Cog Rev" for 10% off.
More show notes and reading material released in our Substack: https://cognitiverevolution.substack.com/
Music License: CJRSKHSGXUURNNID
Full Transcript
Transcript
Transcript
Dr. Isaac Kohane: (0:00) That's when I became dumbfounded because I gave it very difficult cases. I asked GPT-4 what was the next thing it would do, and it went to a molecular diagnosis of something called 11-hydroxylase deficiency. If I ran into 100 random doctors, I'd be surprised if one of them would be able to do that. I think this is the mechanism to keep doctors up to date with the latest. I think it allows you to remember everything about the patient that you should have remembered. It will allow you to avoid errors. Absolutely no doctor should be without it. They should have this sidekick that is meticulous, completely up to date, ever vigilant, and sometimes wrong. And you're there to evaluate that. You can compel doctors to Google the latest findings for their patients. But if we have this active agent that's listening in and looking at the record, we can really ensure that the patient gets that extra level of scrutiny.
Nathan Labenz: (1:09) Hello and welcome to the Cognitive Revolution, where we interview visionary researchers, entrepreneurs, and builders working on the frontier of artificial intelligence. Each week we'll explore their revolutionary ideas, and together we'll build a picture of how AI technology will transform work, life, and society in the coming years. I'm Nathan Labenz, joined by my co-host Erik Torenberg. Hello and welcome back to the Cognitive Revolution. Today I'm excited to begin a short series on AI in medicine. My guest today, Professor Isaac Kohane, is chair of the Department of Biomedical Informatics at Harvard Medical School and co-author of the new book, The AI Revolution in Medicine. Professor Kohane was among a select few people to receive early preview and research access to GPT-4 in 2022, and his approach to exploring, characterizing, and beginning to integrate AI into clinical practice constitutes the single best practical study of modern AI application that I've seen from academia to date. Sam Altman, in a foreword to the book, writes that "this book represents the sort of effort that every sphere affected by AI will need to invest in as humanity grapples with this phase change." I totally agree, and I believe that the key insight is the importance of expert hands-on use. To quote the book, "To really understand GPT-4, you need to use it and live with it. In the same way that no amount of reading and listening to others can tell you what it's like to ride a roller coaster, what it's like to interact with GPT-4 is similarly indescribable." By not just theorizing about the potential of AI, but truly embracing it, Professor Kohane and his co-authors were able to understand GPT-4's strengths, weaknesses, and quirks. Throughout the book and our conversation, he uses concrete examples to demonstrate GPT-4's capabilities and potential impact. Ultimately, he concludes that it is clinically superhuman and will become such a part of the fabric of medicine that to receive medical advice without GPT-4-like support will soon be considered substandard. At the same time, he does understand that it's such an unfamiliar and indeed alien intelligence, somehow both smarter and dumber than any person you've ever met, that human supervision and control remain critical. With a positive vision of human-AI symbiosis in medicine and the goal of allowing doctors to "reengage in medicine as an intellectual and emotional process focused on each and every patient," Dr. Kohane and his co-authors issue an urgent call for large-scale testing as well as public education around AI technology and its limitations. Regardless of the field that you're in, Dr. Kohane's combination of deep immersion, enthusiastic exploration, pragmatic optimism, risk awareness, realism, and forward-thinking vision make this a worthy example for others to study and emulate. The book, which I find myself recommending frequently, is The AI Revolution in Medicine. And I hope you enjoy this conversation with Professor Dr. Isaac Kohane. Professor Isaac Kohane, welcome to the Cognitive Revolution.
Dr. Isaac Kohane: (4:39) Glad to be here. Thanks for inviting me.
Nathan Labenz: (4:42) I'm super excited about this conversation because you have just released, or you're in the process of releasing, a new book called The AI Revolution in Medicine, which really couldn't be more on the nose for the topic of our podcast. I think it's really an exemplary exploration of cutting-edge AI tools and how they can make an impact in our lives and in an area that is obviously so critical and so complicated as medicine. I think it's really a phenomenal artifact that you and your co-authors have created. Before we get into that, I normally don't even do this, but just because there's so much skepticism and doubt and people call me AI hype boy all the time and things like that, could you maybe just give us a quick introduction to who you are, your background in medicine and IT surrounding medicine, so that folks have a sense for where you're coming from?
Dr. Isaac Kohane: (5:38) Sure. I grew up in Switzerland, came to the United States for college, learned about computers as an undergraduate even though I was a biology major, went to medical school, and then was terrified by learning—because I didn't have any doctors in my family—and learning firsthand that it was a noble profession, but not a great science application. So I bailed for a few years to get a PhD in computer science during the heyday of artificial intelligence in the 1980s. Then as we went into AI winter because of the disappointment after some of the overstated promises of that period, I completed my clinical training as a pediatric endocrinologist and started a research group. I was a professor at Children's Hospital, a professor at Harvard Medical School, first in pediatrics. Then I created a new department, Biomedical Informatics. I'm the chair of that department, and I have lots of bright young faculty doing great work using computer science and information processing techniques, all the way from genomics to clinical AI. I just started a new journal, a spin-off of the New England Journal of Medicine called NEJM AI, to focus on how we actually get clinical-grade validation for AI artifacts in clinical care. And along the way, I've been quite involved with a lot of health IT infrastructure as well, trying to get the data to flow well, and most of all for patients rather than for third parties.
Nathan Labenz: (7:22) So I would bottom line that by saying you're about as credentialed as you can possibly get in the world of the intersection of AI and medicine. You've seen a lot. It sounds like you've been through some ups and downs and seen a lot of things that didn't really pan out. Let's fast forward then to about six months ago. I'd love to just hear the story of how it came to be that you got this very early access to GPT-4, what that initial experience was like. And then, of course, we'll get into all the more conceptual issues. But I think the first-person narrative is so powerful. So give me a sense of that.
Dr. Isaac Kohane: (7:59) I had met Peter Lee from Microsoft, who's the head of Microsoft Research, several times in different venues. He's very easy to get along with. He understands academia well. He himself was a department chair of computer science at Carnegie Mellon back before DARPA, back before Microsoft. And he calls me up and he tells me that he has to tell me something, on one condition: that I can't talk about it with anybody else, that he had to get top-level company clearance to talk to me about it. And this was before ChatGPT came out, but it was about the next thing, GPT-4. He showed me what it could do and that was already pretty exciting. But then he gave me access, early access to it directly so I could work with it. And that's when I became dumbfounded because I gave it very difficult cases. A case of a child that was called to the nursery where there was a hole in the base of the phallus, you could not palpate testes. I asked GPT-4 what was the next thing it would do and it went step after step, imaging workup, hormonal workup, and I gave it as it went step by step what the results were, and it went to a molecular diagnosis of something called 11-hydroxylase deficiency. If I ran into 100 random doctors, I'd be surprised if one of them would be able to do that. I knew that there were a lot of limitations to this. I had seen it hallucinate, I'd seen it make up stuff, but the fact that it could have this dialogue with me at a very high level of medical sophistication on a large language model that was not particularly tuned for medicine was mind-boggling. And the fact that I could ask it questions about Talmudic advice to the same program and get expert commentary was unsettling and exciting at the same time. In fact, my immediate reaction when Peter told me was—my words failed me, which is unusual, I'm talkative. But I told him in the end I was not surprised that this happened, but I could not believe it was happening now. I'd expected this would happen maybe five years later. But then as I realized what it could do, within a few days it occurred to me that it was going to transform the back-end business of healthcare: how money is paid, how it's billed, how it's transferred, how procedures are permitted or not, all the back-end stuff. It was going to transform boring administrative stuff that clinicians have to do. But also, since I knew it was going to be released to the public, it was going to change the level of expertise of patients who have been increasingly bereft of primary care support and who, out of desperation, use whatever data sources and knowledge sources they can. I mean, there is a reason why you've heard the term Dr. Google, because even though a lot of the sites you get sent to are either bogus or really wrongheaded, they're just not getting a good enough alternative to get expert advice. And in fact, with all its warts, so long as you have a human in the loop, so long as they eventually talk to a doctor, it's much better than what we have today. And that's why even with GPT-3.5 that people were experiencing before GPT-4, you were hearing lots of people using this for their own healthcare. This is not just an American phenomenon, this is an international phenomenon. And so it's very interesting when you have a transformative technology like this, which takes a fundamentally very conservative—sometimes for the right reasons—discipline like medicine, and changes some important power relationships. It's going to be fascinating to see how it plays out. I know there's going to be a lot of pushback. Some of it will be because of genuine concern about accuracy, about bias, but also there'll be a lot of sacred cows. People are making a lot of money by having a chokehold on how information flows in the medical care system and what kind of advice is given when. You could imagine that if a large language model tells you five reasons why there are alternatives to the surgery that was just proposed, and then you go talk to another doctor and say, that makes sense, the first doctor who proposed surgery is not going to be happy. That's just one example among literally millions of such conversations that are going to happen as a result of this democratization of knowledge. Of course, it's going to happen across many verticals, not just medicine, but medicine is such a personal, obviously personal part of our lives that it's going to make those issues quite real all of a sudden.
Nathan Labenz: (14:33) So that's a great overview, and there's a lot of little areas there that I want to follow up on and dig into in a little bit more detail. But before getting into the medicine-specific aspect of this, one thing that Sam Altman said in the foreword to the book, which I think is so true, is that the study that you guys have conducted here is really emblematic of the kind of study that is going to need to be done across a wide range of disciplines.
Nathan Labenz: (15:00) Hey, we'll continue our interview in a moment after a word from our sponsors.
Nathan Labenz: (15:04) I want to tell you about my new interview show, Upstream. Upstream is where I go deeper with some of the world's most interesting thinkers to map the constellation of ideas that matter. On the first season of Upstream, you'll hear from Marc Andreessen, David Sacks, Balaji, Ezra Klein, Joe Lonsdale, and more. Make sure to subscribe and check out the first episode with a16z's Marc Andreessen. The link is in the description.
Nathan Labenz: (15:29) For what it's worth, and listeners have heard me talk about this before, so this is just for you. I had a similar experience. I was an OpenAI customer. They had a customer preview program for GPT-4 as well. So right around the same time, I got essentially the same access and had the same mind-blowing experience where I was dumbfounded, stayed up all night the first night just trying thing after thing. So I can relate. But I want to hear a little bit more about—okay, once that initial shock wore off and you said to yourself, I've got some time here to really get out in front of this thing and try to figure out what's going on, what it can be, what it can do—how did you then approach that, and what advice would you give to others who are thinking about undertaking such a study in their own areas?
Dr. Isaac Kohane: (16:16) I think that what you have to do is actually get your hands on it and really put it through its paces in real world examples in what you do. For example, I went through literally hundreds of different scenarios, only a few of which have made it to the book, of doctors engaging with it with different questions at different points in the diagnostic and management process, all the way from initial screening to management questions to generating billing codes. I came at it and asked friends of mine what questions they had asked their doctors recently and ran it through those questions. And I think it's really important to see how it answers those questions rather than having a theory about it, because then you see where it does well and where it doesn't do so well. Right now, I don't see any alternative to having lots of examples like that. I then did the same thing for research. I wrote down on a piece of paper all the different parts of both clinical research and basic biomedical research, all the questions that you could ask. And I went through it and there were some areas where it clearly was not ready for prime time and there were other areas where it did really wonderfully. And if you're in, whether as a consumer, a citizen, or if you're running a business, I know of no other way than to actually do that assessment directly. Frankly, that's probably what motivated, even though I decided to do this many years before it was announced, the New England Journal of Medicine AI, NEJM AI, because we need to have clinical grade validation of these tools and it's not obvious where in society this is going to happen. So at least I want to create a journal that will be one source of clinical grade validation for this. But we need to see many such venues, and of course we'll see this across all verticals. If we don't see them across all verticals, that's where the danger is going to be. And so, if you're a decision maker, I see no alternative other than try it out yourself. Don't let other people tell you what it does. Use it in your own hands. And as in many things that are exciting like this, on the one hand it does things better, some things much better than you could imagine, and some things it's incredibly limited. You need to learn firsthand. I've tried to point out some of those in the chapters that I wrote in the book, but again, you need to learn firsthand. I think that's
Nathan Labenz: (19:28) truly phenomenal advice. A lot of times people call the AI moment right now pre-paradigmatic, and I think that speaks directly to your point about don't just take some theory and work from that. We don't have really good theories at this point for what these systems can do even now, let alone what they will be able to do in the future. The book is extremely quotable. And you just alluded to one of the quotes that I loved, which I thought crystallized something that I had felt and tried to articulate, but you guys did it better. When you said that it is simultaneously smarter than and stupider than any person you've ever met. I thought that was genius phrasing. That's
Dr. Isaac Kohane: (20:12) credit goes to Peter Lee for that one. That was a great quote.
Nathan Labenz: (20:15) I love it. Here's two, I think, that are yours, and they seem to be in some tension. So I'll give you both, and then it's kind of like the smarter, stupider thing, but help us unpack this and give us a little more detail. So first quote, How well does the AI perform clinically? My answer is, I'm stunned to say, better than many doctors I've observed. But then, in some tension with that, also quote, For the foreseeable future, GPT-4 cannot be used in medical settings without direct human supervision. So unpack that for us.
Dr. Isaac Kohane: (20:50) How impressed should we be that GPT-4 can ace most of the examinations that doctors have to take to get certified as doctors? I think it's impressive, and it's impressive how fast it's improved. It did very poorly on the National Medical Boards with GPT-3 a year ago, but since then, it's now top notch. But here's the thing, it's not a human being. And I'm not saying this in some human first, bigoted sense. It's just that a lot of the common sense things that we have, that we can share among us, we can't assume are true of GPT. It can engage with you at an expert level about pretty complicated diagnostic and therapeutic problems at any given point, but not often, when the stakes are huge, it can go off the rails and start arguing with you about a point where it might actually be wrong, or about something that is important to you but it doesn't think it's important. And so, on the one hand, there's an old joke: what do you call the person who graduated at the bottom of their medical school class? The answer is doctor. And that means, of course, that half of doctors on average are worse than the top half. And I'm quite convinced that in the hands of those bottom half, GPT-4 can make them much closer to the top half, which would be a huge improvement in healthcare without even talking about the torchbearer, House kind of things you could do. Just getting us to practice medicine the way it should be done by better practitioners would be a tremendous advance in healthcare. But because we can't assume that it understands our values or what's important to the individual in their decision making, not having a human in the loop, I guarantee, is going to create lots of problems. I can tell you also, the more you know, the more you can get out of GPT. If you ask it more intelligent questions, it will give you more intelligent answers. And so, for the foreseeable future, yes, absolutely. And it will become a better and better tool. It will be able to integrate not just text, but vision and sound. It will allow us to see things and catch things that we may have missed, because human beings are not always as attentive, as alert, as fastidious as they should be. But you still have to, just to make it to be an IQ 100 human being means that there's a lot of shared values and performance that you can't actually necessarily rely on GPT-4 to have. Now, I'm quite willing to believe that when GPT-7 is going around as a robot and living among us, it may be able to acquire those kinds of skills and those kinds of values, but we're far from that. And so today, human in the loop until these generative models look quite different from what we see today.
Nathan Labenz: (24:46) You alluded to the torchbearer paradigm and you sketch out four paradigms for how an AI system like GPT-4 can be integrated into medicine. I think in a sense, these line up with ways that it could be integrated into all sorts of different areas of society. So I want to go through each one and just interrogate them a little bit. They are the trial, the trainee, the partner, and the torchbearer. So let's start with the trial. The notion here, as I understand it, is that you could think about GPT-4 as sort of like a drug, and you could try to evaluate it by having some people get it and some people don't, and how does that turn out? You ultimately seem to reject that as a paradigm. I didn't quite follow you all the way there. To me, I thought, well, geez, why not just run that trial? And if it makes people healthier, then that would be pretty meaningful to me.
Dr. Isaac Kohane: (25:41) We have in the other pre-generative model flavor of AI, in these convolutional neural networks that we use so successfully for image recognition, a very focused test. Retinopathy, looking at the back of the eye, or pneumonia, looking at chest X-rays. There we can define, here's a patient population, they have a certain characteristic, they come in coughing, let's say, and we're going to accept patients of this kind, and we're going to say how we're going to determine success or not success, which is how long are they going to live or did they get the right treatment. And we know how to do that. We've done it. FDA has done this for drugs. They've even done it for these AI widgets. And the merit there is it's very clear, especially if you do a randomized controlled trial, whether it is a statistically reproducible phenomenon. That's not going to work for GPT-4 for many reasons. One, when you start a conversation with GPT, you may start because you have a question around that patient coming in with a cough. But then very quickly, because GPT includes all of medicine, you could end up anywhere in medical diagnosis and therapy space. And that means that if you really want to do the trial, you'd have to compare it across all diseases. So it's a huge trial. It's not a narrow question like do they have retinopathy? It's, does this patient population get healthier when GPT was used? Those trials will happen, but it's going to be a big haul, a big lift. It's going to take thousands or hundreds of thousands of patients and it's also not going to be reproducible, because what you find as the effect of GPT assisting doctors in Massachusetts and Boston with a lot of health technology is going to be a different result than what you see in China, which has almost no primary care, or in Africa. Or even more practically, I can tell you that predictions, for example, of who is going to most likely die from COVID. We saw these AI models early in 2020, and hospitals a few hundred miles away had totally different performance with predictive models because the patients were different: different obesity levels, different, and we didn't know, different race composition. It's going to be very difficult. Furthermore, these, unlike a small million parameter model, these large models with billions of parameters are changing all the time. So, I don't even know, in fact, I know for a fact that the GPT-4 I'm working with now in April is not the same GPT I was working with in October 2022. And so it's very hard to stand up behind it and say, oh, it passed some trial test and now it's going to work. That works for drugs. Drugs are static. And even there, drugs is a bit of a problem because with different populations, it might have different results. But since this interacts with not just human physiology but the practice of medicine, how doctors are different in different hospital systems, it's very difficult to validate it. So the trial model is not going to be particularly useful for GPT as it's being used as a general purpose medical problem solver. You could use it for a very narrow purpose like screening, but in that case it's not clear that purpose built models might not be able to outperform it. Its generality and linguistic supremacy is really where we're getting most
Nathan Labenz: (30:17) of the lift. I think there's a number of really interesting points there. One thing that, again, I think is representative of a bigger society-wide issue as well is society and individuals and organizations are all going to change in response to AI. And then, not too far downstream of that, it will also be AIs interacting with AIs in ways that become increasingly hard to predict. And so I do think that is way under-theorized right now. It's one thing to say, we hold all else equal, we give you GPT-4, how does that go? But then we give everybody this, and then we wrap them up as agents. I think we really are stepping into a great unknown. But I still want to challenge maybe just one more time on this question, because it does feel like I'm reminded of an old RAND study that they did. This was probably 40 years ago now. But they did something where they basically said, we're going to give free unlimited healthcare to a certain population and we're going to give no healthcare or whatever they currently have to another population and just step back and observe over a period of a couple of years. And then I'm not an expert in that study. I think it's probably somewhat debated. It would probably be highly debated with GPT-4 as well. But it does seem, at least conceptually possible, that you could sort of do something at a very high level like that and then come back and be like, who's living, who's not, who's reporting well-being and who's not? And if there is a significant difference.
Dr. Isaac Kohane: (31:48) I want to compliment you on your perfect skepticism, but when it comes up to actually doing that trial, it'll be hard to bank the results, because let's say you do it for two or three years. How is it not contaminated? So you give one healthcare system GPT, you give another healthcare system not GPT, just good doctors. If there's a difference, is it because of the difference of GPT or is it the difference because of the difference in healthcare system? You can do your best to match them up, but it's hard to do that. And people are going to hear about GPT and is it really going to be a pure, and the patients are using it and some of the doctors might be using it. I think it's actually practically very hard to do the experiment that you described in a way that you would really want to do more than cocktail party conversation about it and say, oh, it really helped. And this is the percentage of bad outcomes, the percentage of good outcomes. I think that's going to be, for the general use, very challenging.
Nathan Labenz: (32:54) Certainly, I do think it will be hard to avoid the leakage, and it certainly almost seems impossible to double blind. So it definitely challenges the paradigm.
Dr. Isaac Kohane: (33:05) And also just the confounding, the differences in healthcare systems. It's, fortunately or unfortunately, medicine is not right now cookie cutter. There's a lot of bad things that happen because it's not cookie cutter. There's a lot of latitude for doctors to make mistakes and to do things that are frankly remunerative but not necessarily clinically effective. But because of that, there is so much variation. The economic model, a for-profit hospital, a non-profit hospital, an academic hospital, a community hospital, very different. Again, even different parts of the United States are different. So it's hard to know what the unit of study will be. I'm sure people will try it. My prediction is it's going to be hard.
Nathan Labenz: (33:54) Yeah, that sounds right to me. So next up is the trainee. The paradigm here is essentially to give a battery of tests to an AI, the same battery we'd give to a doctor. And if it can pass, then one might think that's good. But again, I think you do a great job of keeping front and center the alien nature of the AI—a term I've also used and you use in the book. The fact is that supporting those benchmarks or tests is a fundamental set of assumptions around humanity and how people will act in these situations or circumstances that we're not testing explicitly. We just can't make those same assumptions for the AI.
Dr. Isaac Kohane: (34:40) Exactly. For example, this really happens. God forbid you have a cancer diagnosis—there is a conversation to be had. Do you have a wedding you want to attend in six months? You might prefer to have a better chance of making it healthy to the wedding than having a better chance of living an additional year. Those trade-offs are a conversation that maybe one day these models can handle, but today they can't. They don't have that common sense. They don't have the human grounding yet.
Nathan Labenz: (35:29) I think that's perfect. And again, this is something that people should develop an intuition for by getting hands-on. You find these alien moments in your own explorations, and when it hits you, it hits you in a way that is hard to internalize by reading. I think your book does a great job of bringing that to the fore, but again, go try it, people. The partner paradigm—I think this is where we're landing right now, as your recommendation. You also use this term symbiotic medicine with the doctor, and it's really a three-party interaction now with the doctor, the patient, and the AI. I'd love to hear what you think the right ways of using a tool like GPT-4 as a clinician are. I understand that it's provisional, but what should the standard of care be as people have this? Could we see ourselves in a scenario where it becomes a violation of standard of care to not consult GPT-4 in the not-too-distant future?
Dr. Isaac Kohane: (36:35) I remember when I was in training, we did something that's now disappeared because the financial pressures of medicine are such that it's not done anymore. After clinic, we'd go over together as a group all the patients and explain what we decided. Some people would say, "Hey, did you think about this?" And we'd say, "Oh no, I should call the patient and say we better do that." That additional reflection absolutely improved our quality of care. By analogy, because increasingly it's going to be multimodal, having an AI of this kind listening in on the conversation, looking at the clinical notes, and saying all the time, "Hey, did you think of that?"—you can say, "Oh, I already thought of that, be quiet," or "Oh, maybe I should think about that," or "Did you know Mrs. Jones already had that test? Don't order another test." There's a new drug that's actually twice as effective with lower toxicity. We try to keep doctors up to date and it's impossible even if they're trying, and not all of them try. I think this is the mechanism to keep doctors up to date with the latest and greatest in expanding medicine that goes into genomics, for example, that so many doctors don't know about—sometimes patients know more about it than they do. It allows you to remember, in quotes, everything about the patient that you should have remembered. It will allow you to avoid errors. It will allow you to identify other patients that perhaps you should be paying more attention to. The way I think of it is, done well, absolutely no doctor should be without it. They should have this sidekick that is meticulous, completely up to date, ever vigilant, and sometimes wrong. And you're there to evaluate that. If I were a patient, I would want my doctor to have that. In the previous models, we couldn't compel doctors to keep up to date with the literature. We can't even compel doctors to Google the latest findings for their patients. But if we have this active agent that's listening in, looking at the record, we can really ensure that the patient gets that extra level of scrutiny. And of course, patients are going to have access to the same level of expertise potentially. I think it's going to cause the healthcare system to have to raise the quality of its game.
Nathan Labenz: (39:59) Yeah, I think that's really—again, I appreciate that so much in the book because I think there really is no putting the toothpaste back in the tube, so to speak, on this. The only way through is through. So I appreciate you confronting that head on. I actually was thrilled to see a mention of the app Replika in the book. We had the CEO of Replika on the show as one of our guests, and it was such an enlightening conversation around how lonely a lot of people are, how much even a pre-GPT system meant to people. She's had this app for years, and even when the system was so unsophisticated, it meant a lot to people. And now it's obviously accelerating.
Dr. Isaac Kohane: (40:41) I've actually been slightly terrified about that aspect of it. What today we think of as the crappiest interactor ever—the ELIZA system that Joe Weizenbaum wrote, which was just a very simple, very shallow grammatical parser—because a Rogerian therapist basically just plays back to you what you said with a tiny bit of permutation, people would use this as an actual therapist. And this system really did not know anything. Joe Weizenbaum's own secretary would lock herself in her office for a session with it. It's both perhaps heartwarming but also pretty terrifying. I think it does speak to how much we need that interaction around healthcare. Lest your listeners be deluded about it, especially the younger ones, people with actual health problems have very few people they can talk to. When they run into a primary care provider, if they are lucky enough to have one, it's a ten-to-fifteen-minute meeting, and most of their questions go unanswered. When they have another question, it's waiting another six months where things could be done, and then they're not done until things are much worse. So there's a huge need.
Nathan Labenz: (42:08) So let's go then to the possible future that we're not quite in yet, which is the torchbearer model. You might also call this the oracle model or even the scientific or research pioneer AI. This was something that I also looked really hard for. One of the first things I tried to figure out is, does this GPT-4 system show anything that seems like it can generate truly new, novel scientific insight? And I came to the same conclusion that you did. You wrote in the book: "Can GPT-4 independently develop testable hypotheses that entail specific therapeutic interventions with a high likelihood of being supported by clinical trials? Currently, the answer is no." I think that's definitely true. But I don't have a great sense of exactly why, or what the barrier is, or how or when that might change. What do you think is the key thing that it can't do right now that ultimately leads to a no on that question?
Dr. Isaac Kohane: (43:09) I think the interesting thing is that it's learning an amazing amount, sometimes in a superhuman way, about the things that we talk about. If there are a lot of human beings talking about it, it will know about it. And I don't want to anthropomorphize too much, but it can do some theory formation around those things that human beings talk a lot about. But for generation of new paradigms where human beings have not talked very much about it, it's a bit at a loss. What kind of data it would need and how to think about that—there's not a lot for it to learn. I do think there are ways to approach it, but right now, this current generation of linguistically supreme but not scientifically supreme models, I think, are limited. At the same time, at the margins, because of this maximum exploitation of what we already know collectively but not individually, it can surprise us. For example, doctors are often stumped. There are millions of patients worldwide, hundreds of thousands in the United States, who are not diagnosed. I have the privilege of being the principal investigator of the coordinating center of something called the Undiagnosed Diseases Network. It's across multiple hospitals—Stanford, Harvard, Baylor, Florida, Duke, and so on. What we find is if you ask the right questions, if you do the right studies, if you do genomic sequencing, you can sometimes—maybe about 34 percent of cases—figure out what the problem was. That way you make a meaningful change for that patient. This is after seeing a lot of top doctors. So I put GPT-4 through the paces, not only on those cases but even cases that have not been solved. I said, "Hey, we found mutations in five genes. Which one of these genes do you think is responsible for this case?" And it came up with a top-ranked one that was the one we had independently, through a lot of basic science modeling of the disease—the mutation in a fruit fly of all things—validated that it was the cause of the disease, out of all the other genes which were plausibly causing the disease. So it's not quite theory formation, but it's really working at the very hairy edge of what's known because of its ability to bring together all this knowledge in a focused way for a question.
Nathan Labenz: (46:10) So we have about ten minutes left. I think maybe the final two sections I want to go into are the next evolution of GPT-4. Sam Altman has recently said that they're not yet training GPT-5, but they plan to do a lot more with GPT-4, some of which has already been done and we just haven't seen it, like the multimodality and ability to understand images. I'd love to hear your thoughts on what you think, without just hyperscaling the model 100x more—what are the incremental enhancements? That could be attaching it to systems, or it could be some medical fine-tuning, or it could be more multimodality so it can natively understand scans. What are the things there that most excite you that you think will be most impactful?
Dr. Isaac Kohane: (46:58) I've been a science fiction fan my whole life. Let me try, at least for the first part of this question, not to go there. Here's what I think is going to happen relatively soon. One is using entire healthcare systems as not a fine-tuning model—because right now it's very hard to fine-tune GPT—but to have a certain accessory set of embeddings, using, for example, the Pinecone framework, to create essentially a knowledge base that GPT can use to customize its advice for that healthcare system. Remember what we were talking about before—how different hospitals are different, different populations. One way to address that is to say, here's your large language model, GPT-4 or perhaps Google's Bard, and here you have a set of embeddings representing essentially the multidimensional co-occurrence of findings in this patient population. This way, you'll be able to give much more customized and deeply reflective advice of the practice of medicine in that hospital system than generic GPT-4 or other large language models. I think customizing it to the knowledge base and the practices of a hospital system is going to be one important area. The other important area is going to be specifically in speech. I think that having omnipresent speech recognition, multi-speaker speech recognition, is going to address the thing that's been causing frankly the most burnout for physicians and most unhappiness—they've been turned into documentation clerks. By having a combination of the prior record of the patient and listening to everything that's going on between the doctor and patient, I'm fairly optimistic that you can come up with a very good first draft of the clinic note that the doctor can just look at, say yeah or edit, and then have a version of that note sent to the patient, to the referring doctor, and perhaps even to the insurance company appropriately. I don't have to stretch my imagination at all to say that's where we're going to go next. Beyond that, I do think the stretch is going to be in discovery, because there is going to be some kinds of theory formation that are going to be very hard for GPT-4 or any of the other leading large language models. But I think that there can be constrained parts of theory formation, where you can say, "Can you identify in these data the drug that seems to be the most effective based on X, Y, and Z?" And although that's not a full de novo theory formation, it's a focused question that with the right data, I think it will be able to pull off. But I think we need to have the right kind of data, and it's not always obvious which is the right kind of data. It's not clear that the standard transformer model as it is currently implemented will work well for those applications. But we'll see.
Nathan Labenz: (50:46) You allude a little bit to the integration of, say, an AlphaFold-type system—something that natively groks protein structures that could be integrated with language. Paint that picture a little bit more for us as well.
Dr. Isaac Kohane: (51:01) Of course, the best science fiction outcome, which I think we will be there in probably ten to fifteen years, or maybe earlier, is I have a patient with this cancer with the following mutations, both in the germline and the cancer. What's the right drug for this patient? If we don't have any drugs, what's the right drug to develop? And right now, AlphaFold, which is a kind of language model because its language is the language of amino acid sequences, and that plus a little bit of physics, but mostly the large language model, it's able to solve the structure of almost all proteins. It's already being used to understand how small molecules dock. There are already beginnings at DeepMind, which has been leading in this area. They haven't understandably been totally transparent about the next generation, but it's going to be around large protein interactions. And I do think that something like simulating a cell will in fact be possible in a five to ten year time frame. But well before we get to that, it means that in a five year time frame, we're going to be able to do a much better job in saying, here's a drug company, it has a bunch of interesting drug leads to treat this cancer, to treat Alzheimer's. Which of these are going to be most likely to be effective and cause the least side effects? Understanding and bringing together the knowledge of structure from something like AlphaFold and its descendants, and knowledge about disease progression that you get from large datasets from hospitals, that multimodal linking will allow us to do a much better job in getting a much better yield, bringing candidates all the way from this is a possible lead compound to something that will make it through Phase 1, 2 and Phase 3 with both efficacy and lack of side effects sufficient to advance the project. I think that will be transformative and I think we will see this within the next five years. But it's not just scaling up.
Nathan Labenz: (53:31) I think that's undeniably a super exciting and inspiring vision. The last question I wanted to ask is kind of on the path from here to there. You alluded to there's going to be all sorts of conflict. There's going to be all sorts of noise. But I was struck by a couple of short quotes from the book. One that kind of surprised me: you said you believe in obtaining patient data from patients as opposed to from systems. That got me wondering, why is that? And ultimately, who do you think the AI should be working for? Do we all end up with the hospital system having its AI and the doctor having theirs and the patient having their own? Or is there one that we all kind of work with together? The dynamics of kind of who the AI is aligned to and whose interests are primary seems very under-theorized. I'd love to get your thoughts.
Dr. Isaac Kohane: (54:23) No one knows how it's going to settle out in the long term. But I think in the mid-term, you're going to have AIs that are aligned for different organizations. Yes, there will be an AI that's aligned for the insurance company, an AI that's aligned for the hospital, and an AI that's aligned for patients. And there'll be different markets for those. In the end, as a human being, I hope that, and I think there would be the most public support for consumers having the alignment be with them. And so I personally feel that if I am able to donate my data to a large language model or any other kind of AI, if it's aligned with me, I'm much more comfortable giving it my data than with another AI which is not necessarily aligned with me. Just as I'm much more likely to tell my doctor it's okay for them to use my data to get a second opinion from a colleague, I'm much more likely to want to share my data. And by the way, through work that Apple has done that we were actually quite involved with a few years ago, through Apple Health, they have a relationship with 800 hospitals and it's growing by hundreds every month or so. You can actually download all your medications, labs, diagnoses, procedures, demographics, and soon clinical notes, but all those first. And you can actually just repurpose them, just as you can repurpose your images from your camera in your iPhone. And so this allows now tens of millions of consumers to potentially contribute their data. And because there are alignment issues, medical alignment issues, not end of civilization alignment issues, there are different alignments in the way healthcare is organized. I do think that asking patients for consent for use of their data will give us the biggest buy-in because patients want their data to be used but for the right reasons. And I think that's a patient and human autonomy question. That's why we're fortunate to be in this era where more and more of our data has been liberated and is made available to us as consumers. And I think in that mode our AI assistants should get it from us rather than from entities that may not be as well aligned with us.
Nathan Labenz: (57:15) I regret only that we don't have a lot more time to dig into this in even deeper detail, but the good news is you have written a whole book on this. The book is The AI Revolution in Medicine: GPT-4 and Beyond. I really think it's an exemplary study and applaud you for all your work on it. Professor Isaac Kohane, thank you for being part of the Cognitive Revolution.
Dr. Isaac Kohane: (57:39) Nathan, thank you. And now I know which podcast to listen to.