Gene Hunting with o1-pro: Reasoning about Rare Diseases with ChatGPT Pro Grantee Dr. Brownstein
Nathan explores the cutting-edge intersection of AI and rare disease research with Dr.
Watch Episode Here
Read Episode Description
Nathan explores the cutting-edge intersection of AI and rare disease research with Dr. Catherine Brownstein of Boston Children's Hospital and Harvard Medical School. In this episode of The Cognitive Revolution, we dive into how frontier AI models are revolutionizing the diagnosis of rare diseases. Join us for an insightful conversation with a ChatGPT Pro grant winner who's pioneering the use of AI to help patients find answers faster.
Help shape our show by taking our quick listener survey at https://bit.ly/TurpentinePulse
Check out Modern Relationships, where Erik Torenberg interviews tech power couples and leading thinkers to explore how ambitious people actually make partnerships work. This season's guests include: Delian Asparouhov & Nadia Asparouhova, Kristen Berman & Phil Levin, Rob Henderson, and Liv Boeree & Igor Kurganov.
Apple: https://podcasts.apple.com/us/...
Spotify: https://open.spotify.com/show/...
YouTube: https://www.youtube.com/@Moder...
SPONSORS:
Oracle Cloud Infrastructure (OCI): Oracle's next-generation cloud platform delivers blazing-fast AI and ML performance with 50% less for compute and 80% less for outbound networking compared to other cloud providers. OCI powers industry leaders like Vodafone and Thomson Reuters with secure infrastructure and application development capabilities. New U.S. customers can get their cloud bill cut in half by switching to OCI before March 31, 2024 at https://oracle.com/cognitive
NetSuite: Over 41,000 businesses trust NetSuite by Oracle, the #1 cloud ERP, to future-proof their operations. With a unified platform for accounting, financial management, inventory, and HR, NetSuite provides real-time insights and forecasting to help you make quick, informed decisions. Whether you're earning millions or hundreds of millions, NetSuite empowers you to tackle challenges and seize opportunities. Download the free CFO's guide to AI and machine learning at https://netsuite.com/cognitive
Shopify: Dreaming of starting your own business? Shopify makes it easier than ever. With customizable templates, shoppable social media posts, and their new AI sidekick, Shopify Magic, you can focus on creating great products while delegating the rest. Manage everything from shipping to payments in one place. Start your journey with a $1/month trial at https://shopify.com/cognitive and turn your 2025 dreams into reality.
Vanta: Vanta simplifies security and compliance for businesses of all sizes. Automate compliance across 35+ frameworks like SOC 2 and ISO 27001, streamline security workflows, and complete questionnaires up to 5x faster. Trusted by over 9,000 companies, Vanta helps you manage risk and prove security in real time. Get $1,000 off at https://vanta.com/revolution
CHAPTERS:
(00:00:00) Teaser
(00:00:56) About the Episode
(00:04:45) Rare Diseases Common
(00:06:48) Patient Journey
(00:12:57) Genome Sequencing
(00:19:39) Sponsors: Oracle Cloud Infrastructure (OCI) | NetSuite
(00:22:19) Diagnosis Process
(00:30:50) Data Pipelines
(00:35:51) Sponsors: Shopify | Vanta
(00:39:07) Interaction Graphs
(00:42:18) Data Accessibility
(00:43:42) AI in Pipelines
(00:45:40) LLM Impact
(00:48:40) Anomaly Detection
(00:52:07) Data Sharing
(00:58:49) Data Reform
(01:02:41) AI's Potential
(01:04:30) AI Applications
(01:06:57) Prompt Engineering
(01:14:51) Model Comparison
(01:19:16) Prompting Insights
(01:22:14) Move 37 Analogy
(01:24:34) Future Potential
(01:29:27) Future Experience
(01:32:39) Outro
SOCIAL LINKS:
Website: https://www.cognitiverevolutio...
Twitter (Podcast): https://x.com/cogrev_podcast
Twitter (Nathan): https://x.com/labenz
LinkedIn: https://www.linkedin.com/in/na...
Youtube: https://www.youtube.com/@Cogni...
Apple: https://podcasts.apple.com/de/...
Spotify: https://open.spotify.com/show/...
PRODUCED BY:
https://aipodcast.ing
Full Transcript
Transcript
Nathan Labenz: (0:00) Rare diseases are quite common, actually. There's more people with rare diseases in the United States than there are natural blondes. We need to sequence the whole world in order to understand what is actually disease causing and what is just background variation. Logging into the Harvard Library, getting that paper, skimming the abstracts is not at all what I want. Then going back and being able to ask for a summary, be like, oh yeah, this sounds good, has changed my life. It's all cutting down on this mundane, time consuming, really tedious part of the job and getting back to the fun part of gene discovery. There's going to be this whole generation of geneticists that aren't going to know how things were done before all this was available because it's going to be a huge game changer and time saver.
Nathan Labenz: (0:55) Hello, and welcome back to the Cognitive Revolution. Today, I'm speaking with Dr. Catherine Brownstein, MPH, PhD, and Assistant Professor at Boston Children's Hospital and Harvard Medical School, whose research focuses on identifying the genetic causes of previously unexplained rare and orphan diseases, and who was recently awarded a ChatGPT Pro grant from OpenAI. You might be surprised to learn, as I was, that so-called rare diseases are not necessarily all that rare. Any disease affecting fewer than 1 in 2000 people or fewer than 200,000 people in the United States is classified as a rare disease. And often families spend painfully frustrating years bouncing around the medical system in search of an accurate diagnosis before ultimately reaching Dr. Brownstein's elite team at Boston Children's. Of course, considering the radical cost reduction that we've seen in genetic sequencing in recent years, which with nearly a 10,000x improvement in affordability is one of the very few cost curves ever to rival that of large language models, there's been an ongoing revolution in this space even before the current AI moment. In 2007, a genome sequence cost upwards of $1 million. In that era, it was used only in the most challenging cases and was often a difference maker. Today, it's just a couple hundred dollars and has become commonplace for individual patients. But that creates new challenges for specialists like Catherine who now have to comb through a vast and still exponentially growing literature to find candidate diagnoses for their most challenging cases. This new wealth of information, which as you'll hear could be growing even faster still with improved regulations and incentives, makes information processing capacity relatively scarce and valuable. And you can probably guess where this is going, a great target for the latest generation of reasoning models. This conversation then is above all a window into how frontier large language models are starting to become useful in highly specialized fields. Dr. Brownstein is pioneering the application of AI to rare disease research in real time. She's using AI to triage potentially relevant research and in some cases to connect the dots between subtle clues. She's working directly with OpenAI to develop use cases and provide feedback. And considering that every case represents a real person with a life-altering or even life-threatening condition, she's constantly working to find the right balance between enthusiasm for AI's capabilities and a healthy skepticism for any specific AI output. As you'll hear, she's still figuring out where AIs can be the most valuable, how best to use them, and how much to trust them. That such an established expert is bringing what amounts to a beginner's mindset to such high-stakes cases may be surprising to some, but really, I don't think it should be. Even the most AI-obsessed folks like me have only managed to log a few thousand hours with large language models, and nearly all of that was with earlier and less powerful models. So for the current frontier, we're all still figuring this out together, and there's currently an unprecedented opportunity for people with deep experience in specific niche domains to become the leaders in applying AI to their particular fields. Importantly, if you are listening to this podcast, this is probably something that you can personally do, even if you're still relatively new to the AI tools themselves. If you're inspired to take on that challenge and think I might be able to help, or if you have any feedback or guest or topic suggestions, please don't hesitate to send me a note. One of the very best parts of doing this podcast is hearing from you, the listeners, and I am pretty consistent about reading and responding to every message I get. Of course, we always love it when listeners share the show with friends online or offline, and we very much appreciate the many reviews we've received on Apple Podcasts and Spotify as well as the increasingly active comment section on YouTube. For now, I hope you enjoy this conversation, which I hope will become the first in a series of episodes with ChatGPT Pro grant winners, on the application of frontier AIs to high-stakes medical research with Dr. Catherine Brownstein, MPH, PhD, and Assistant Professor at Boston Children's Hospital and Harvard Medical School. You are specializing in the discovery of new genes for rare and orphan diseases, and you've recently been awarded a ChatGPT Pro grant. Welcome to the Cognitive Revolution.
Dr. Catherine Brownstein: (5:02) Thank you so much for having me.
Nathan Labenz: (5:04) Yeah. I think this is going to be really exciting. As regular listeners know, I have a growing obsession with the intersection of AI and biology. And so when I saw your name on the o1 Pro blog post announcement, I was excited to reach out and learn more about how you are applying the latest AI tools to some of these very challenging and pressing problems. It's a good start with just a zoomed out kind of step back overview of your work because our listeners are definitely following AI developments. They know about o1. They know about o1 Pro. Probably quite a few have subscribed even at the $200 a month level, but they probably don't know a lot about rare diseases and what our state of knowledge is, what sort of techniques people use to try to figure these things out. So I'd love to just get, you know, this might be a very tough question, maybe the toughest question, but what's kind of the layman's introduction to your advanced work?
Dr. Catherine Brownstein: (6:01) So when I'm asked that question, I usually answer it by saying that rare diseases are quite common, actually. There's more people with rare diseases in the United States than there are natural blondes. So it's actually quite common to have a rare disease. And a lot of it's how you define disease. Autism is really common, but autism due to a de novo variant in KCNJ8 is quite rare. It's kind of a tricky definition and it's always evolving as we learn more, but basically, I consider myself a gene hunter and trying to diagnose the undiagnosed.
Nathan Labenz: (6:48) I read in preparing for this, I think it was Perplexity gave me this answer that the definition of a rare disease is one that affects fewer than 200,000 people in the United States. That was a surprisingly large number.
Dr. Catherine Brownstein: (7:03) Isn't it? It's always wild to me because when you think of a city that's 200,000, that doesn't seem like a small town, or at least it doesn't to me. But that's the definition of rare in comparison to common disease, which can affect millions or epilepsy is one percent of the population.
Nathan Labenz: (7:26) Yeah, that's really interesting. So maybe a little bit more color, kind of background on maybe the patient's journey through the medical system to get to you and then sort of your experience of encountering new patients. I know it's of course going to be impossible to give one story because I'm sure they're extremely varied but how do you end up coming into contact with patients? What have they gone through to get to you and what do you do to get a new case?
Dr. Catherine Brownstein: (8:00) I'm really lucky to be at Boston Children's, which is an internationally known tertiary hospital. So we get really interesting cases from all over the globe. Usually a patient or family starts out where they go to their local medical provider and they can't figure out what's wrong with the child or person, and then they get referred from specialist to specialist, and they still can't figure out what's wrong, and then eventually they get to us, where a lot of times patients and families have been bounced around for years trying to figure out what's going on and what's next and what can they expect and just looking for answers. So sometimes it's not that case. We have a lot of really savvy, medically savvy families where they know their child and they know something's wrong and they need the best right away and they're going to search on the web and find the person who works on that phenotype and call every day until they get an appointment. But a lot of times, it's a more circuitous route and going from doctor to doctor to doctor, and then finally somehow ending up at Boston Children's. And then if they see a clinician who doesn't know, they often refer the case to the organization that I work with, the Manton Center for Orphan Disease Research, and we get a lot of the negative cases throughout the hospital where they think it's genetic in origin, and then we're able to get the medical records. We're a philanthropically funded virtual center, and patients can self refer. So then we get all the medical records, all the genetics that have been done before, and then we have a huge multidisciplinary team, and we review the case, go through it, do reanalysis. Sometimes we resequence or do a new technology if one is available like RNA-Seq or long read sequencing, and then we work together to try and figure out what's going on. When I first started in 2011, genome sequencing, exome sequencing was quite rare. So if patients were able to get it, a lot of times it was like shooting fish in a barrel. We would have something like an 80 percent diagnosis rate. But now genome sequencing, next generation sequencing is so common that we only see the families if they've already had a negative sequencing test. So we go from diagnosing 80 percent now to like 10 percent just because we're getting the most difficult of the difficult cases that have already been reviewed by really good geneticists and getting to us where they just can't figure it out. But that's one thing that I think AI can really address is shortening this diagnostic odyssey for patients that really just have been jerked around, not through anyone's fault, but just by the nature of how these things go. And maybe AI can help in analyzing symptoms, or maybe you should see this doctor right away, or maybe you need this test, or go to this specialist and just make things happen a lot faster.
Nathan Labenz: (11:30) That call back to 10 years ago, I think, is quite interesting. Maybe you could give us a little bit of a sense of the relative pass through rates at these different levels of filter. People go initially to their local doctor, the local doctor doesn't know what's going on, they get referred, eventually they get to your hospital, you've got the best of the best there. And there's another related but distinct line of research recently that has been comparing AI's ability to diagnose through a natural language conversation with patients against doctors, and it seems like against at least the average doctor, the latest models are now very much holding their own. I don't know if that would be true if we were looking at the Boston Children's elite clinicians and their ability to diagnose but they still don't know what's wrong. In the past, if I understand things correctly, because the sequencing was rare, you could often just do a full genome sequence and then be like, oh, okay, well, there's your problem. The literature has sort of characterized this. Now that we have this additional information, there's a pretty clear match. And now today, that low hanging fruit is getting absorbed somewhere else in the system before it gets to you, and you're now seeing things that are basically not characterized in the literature at all or maybe just a little bit. I'm not sure why the connection wouldn't be one that others could make, but yeah, use that prompt and fill in a little more detail if you would.
Dr. Catherine Brownstein: (12:58) When I started, I was actually hired as a project manager at Boston Children's to help clinicians get their patient sequenced. So clinicians, even though they weren't geneticists, were really good at identifying cases that were probably genetic in origin. So they had freezers full of this DNA just waiting for the technology to come online where they could analyze it and figure out if there was something genetic that could be discovered. When I started, an exome, which is just one percent of the genome, it's just the coding region, just the genes, I mean, it's a good place if you're being economical because a lot of the variants are within the coding region. So an exome, again one percent of the genome, was $3,800 or it was close to $4,000. Now, I just priced out an exome, it's $160 for that exact same test. So it was so expensive that they went through a rigorous selection process if you were going to get an exome done. It was pretty much if you were going to bet money, you were going to bet money that it was genetic and you were going to be able to figure it out by doing a trio, that is the patient and the parents, and you're going to see something that's de novo, which is basically not in the parents but it's in the child, so it's like lightning striking, an error happening during development which causes disease. So the first ones of that were that 80 percent that I was talking about, and it was because these patients were collected, some in some cases 20 years ago, and they had the DNA there, and sure enough, there was a premature stop codon or a huge deletion of one gene that was already hypothesized to be related to this condition or a similar condition, and you could point at it and be like, yep, that's it. And then also, you would have multiples of the same type of case where you'd see four families with the same gene missing, with the same phenotype, and then you're really confident that that gene is causative to the condition. As the price dropped, it became less of a thing that happened. And it's not because anywhere you could get an exome done, and there's a lot of geneticists, there's a lot of really savvy clinicians, there's a lot of for-profit companies that you could send off and get a report back, get diagnosed, and have more precision medicine treatment and go on your way and do very well. So it was the negative cases that were getting referred up the chain to Boston Children's, where they've already had a genome and it came back as negative, that is, there were no obvious variant in the known gene that could explain what was going on. So then it becomes a little trickier. We start forming cohorts. At Manton Center, we work with clinicians. We have clinicians in every department of the hospital where they are able to refer patients to us, we consent them to our protocol, and then we collect samples, medical records, and we sometimes wait and we reanalyze, and when we have four patients, 10 patients with the same thing, we're able to look at them together as a whole group and be like, all right, are there things in the same gene, the same family of genes? What can we come up with a hypothesis here? And in 2014, I think Zach Kohane had been a previous guest on your podcast, he had the idea of having an international competition to solve undiagnosed families. So we got three families with seemingly Mendelian disorders, that was that we think they're genetic, we think there's something going on with a clear relationship between gene and condition. And we released their data all over the globe to 23 different teams, and we had them compete and each submit a report on what they think the cause of the family's conditions were. And it was really interesting. A lot actually were diagnosed from this. I think two out of three actually walked away with diagnoses from this process. And also, we were able to show that diverse teams did much better, that you can't just have a bunch of bioinformaticists in a room together looking at cases and expect them to come up with the right answer. It was teams that had a mix of research assistants, genetic counselors, researchers, clinicians, research clinicians, clinical geneticists working together, and all those diverse perspectives, they on a whole did, were able to solve more cases. So that was really interesting. And I think that's a recurring theme here when we're talking about LLMs and large language models and AI. None of this exists in a vacuum. It's helping us along, and maybe it will be enough, but right now having multidisciplinary, multi-strengths all working together, we do much better as a whole. The other thing I wanted to add is we still see those slam dunks though. So we just had a case a little while ago where it was one family and three generations all with a rare bone disorder and the matriarch or patriarch was in his 90s and we were able to give a diagnosis to this person in their 90s, which I thought was really cool and kind of shows the power of just having an answer. And he's already gone through surgeries he didn't need to go through and had his whole life for this condition. But something as simple as being able to explain what's going on in 10 seconds as opposed to three minutes of describing symptoms means a lot to the family.
Nathan Labenz: (19:34) Yeah. Imagine, especially if you've been dealing with something like that for 90 plus years. That's crazy to think about. Hey, we'll continue our interview in a moment after a word from our sponsors. It is an interesting time for business. Tariff and trade policies are dynamic, supply chains squeezed, and cash flow tighter than ever. If your business can't adapt in real time, you are in a world of hurt. You need total visibility from global shipments to tariff impacts to real time cash flow, and that's NetSuite by Oracle, your AI powered business management suite trusted by over 42,000 businesses. NetSuite is the number one cloud ERP for many reasons. It brings accounting, financial management, inventory, and HR all together into one suite. That gives you one source of truth, giving you visibility and the control you need to make quick decisions. And with real time forecasting, you're peering into the future with actionable data. Plus with AI embedded throughout, you can automate a lot of those everyday tasks, letting your teams stay strategic. NetSuite helps you know what's stuck, what it's costing you, and how to pivot fast. Because in the AI era, there is nothing more important than speed of execution. It's one system, giving you full control and the ability to tame the chaos. That is NetSuite by Oracle. If your revenues are at least in the seven figures, download the free ebook, Navigating Global Trade, Three Insights for Leaders at netsuite.com/cognitive. That's netsuite.com/cognitive.
Nathan Labenz: (21:09) So a lot of different questions I have about all this, but in these cases where you're getting all the way through the entire medical system, basically, and finally getting to one of these cross functional teams. Can you tell us a little bit more about what the process looks like when that team gets to work? In AI prompting, we talk about thinking step by step and breaking problems down. Maybe one way to frame it would be like, what is the sort of collective chain of thought that the group goes through to take start with inputs. Inputs would at least be symptom descriptions and results of genetic testing sequences. I don't know if there's any other inputs that you guys get at that level. I guess you have the whole scientific literature also as sort of an input. And then you do some thinking, reasoning process, maybe some additional testing, and finally you get to a result. What are you guys doing when you're doing that?
Dr. Catherine Brownstein: (22:05) Okay. So when a case comes across my desk, usually there's a medical record that comes along with it, because again, they've been bounced around for a long while, and usually at this point now they've had some genetic testing that we get transferred to us. More and more patients are coming with it on a thumb drive, and here's my genome, which I think is really cool and didn't even happen a few years ago. And we run it through our genomic pipelines, and usually we run it through more than one because they all have their strengths and weaknesses. Some are more comprehensive, but they're harder to use, and you'll get more false positives because they rule less things out. And then you have others that are really easy to use. My kids can use them and understand intuitively what it means. But sometimes they're black boxes and you don't know the reasoning behind why a variant was eliminated or not. I'm a PhD, non-MD, so I usually like things to stay anonymous. I don't want to be a walking HIPAA violation, so I kind of don't want to know the names or meet the families, but sometimes I do. I know who they are. And we go through everything case by case and line by line. There's certain phenotypes where I think more information is better. You'll get the occasional phenotype that's only linked to one condition, like lack of tears in one condition, and then that's a really important clue, and then we'll look at that gene. And then also for what the patient is experiencing overall, generally there's gene lists of what has already been discovered, and you can look at the genomic information for any variation that could be causing disease, we'll call it, for simplicity, pathogenic variation, though suspected pathogenic variation is probably more accurate to say in those genes. So you get the new analysis done, and then you're looking at what's known. And then if you don't see anything, then you start looking at your special sauce. All right, how am I going to approach this? You look at what else in the genome is notable. Is there a huge structural change that hasn't been linked to disease or a translocation where it's like chromosomes break and reattach the wrong spots, or is there some other deletion or duplication? What's rare? What's unique to this patient? Then you also, now there's all these new technologies like looking at epigenetics, which is you can predict which genes are turned on and off. And even if you can't see a mutation, is the gene of interest, the expression perturbed somehow, where it's constitutively on even though it's not supposed to be? Can you take a look at that? Sometimes in the back of your mind, you're like, okay, is it multifactorial? It's not just one gene impacting it. It's not some big error in one gene. It's a bunch of tiny little things scattered throughout the genome. And then a different type of test, like looking at a GWAS, genome wide association study, or SKAT, where you can look at rare variation weighted by how rare the variation is and how predicted damaging it's going to be to a protein, and look at that and see, okay, is there some reason why you think that this is going on? And a lot of times still, let's say it's 25 to 33 percent of genetic testing comes back positive. So that means 75 to 66 to 75 percent are negative. Then you go through this whole process and still most are negative. And then you put it on the shelf and you wait a little bit and you analyze it again a year later. Reanalysis is actually really important because things get discovered all the time. You can't be an expert in every gene, every condition, every structural variation, and other people are actively working on it, and a lot of times you'll take something off the shelf and look at it again and it rises right to the top. The number one thing in the genomic browser is the answer. And you've stared at it before a year ago and you didn't make that connection. Now all of a sudden there is an answer. Actually, I was asking Alan Beggs and Monica Wojcik, who are the director of the Manton Center and the medical director of the Manton Center for success stories, if they had any that stuck out. And one was 20 years ago, it was three siblings all passed away from a type of myopathy and they couldn't figure it out. And they kept testing and testing and testing and eventually ran out of DNA. And then we had a pilot grant at the hospital to do RNA-Seq, and Alan and Monica submitted this family because we had some RNA left and found a variant in CFL2, I think, that's the gene. And even though it was 20 years ago, the surviving siblings were now planning families, and they had an answer, and they could do genetic testing to make sure that there weren't two variants, that they're each carriers of one variant. They didn't pass away, so they only had one variant, not two. And that their partners didn't have a variant in the same gene and could ensure that the next generation wasn't going to have this horrible fatal myopathy. So in some ways, and then we had an interesting discussion like, okay, is that really a success story? Because whenever there's multiple deceased, is that really a success? Yes, you diagnosed it, but it's not changing anything. But it is changing the future. They're going forward with their eyes wide open and being able to plan as a result.
Nathan Labenz: (28:49) Yeah. That sounds like certainly some form of success to me. I have three young kids and fortunately no crazy medical conditions in my family, but we still did a little bit of genetic testing. I would say I was probably never more nervous than when opening that report just to make sure that I wasn't about to see something really weird or strange that would change the course of my life. So to be able to be on a potentially negative course and get the assurance that you could get on a confidently a path where you'd be able to have healthy children, I think sounds, to put it mildly, I would say a life changing development for those folks that definitely resonates with me. Okay. Let me dig back in a few points along the way in terms of how you've got this all. Maybe try to summarize a little bit and interject a couple questions.
Dr. Catherine Brownstein: (29:38) Okay.
Nathan Labenz: (29:39) The pipelines that you're describing, those are, I guess, maybe a mix of maybe commercial options or other academic groups have put things out.
Dr. Catherine Brownstein: (29:48) Mhmm.
Nathan Labenz: (29:48) The inputs to those, are they highly structured data? I mean, I'm kind of thinking here my sequence is, of course, structured.
Dr. Catherine Brownstein: (29:58) Mhmm.
Nathan Labenz: (29:58) My symptoms are not. Right? I describe myself in words. The doctor I'm talking to kind of notes that in words. Is there a way where that gets translated to specific coded sets of symptoms? Or what is the intake of these pipelines? And then are they basically doing deterministic work where they're sort of running essentially running down a long checklist and saying, if you have this, we check this, but you don't have that, so that's out and sort of just working down a long set of known possible conditions? Or how would you characterize what those pipelines are doing internally? Nathan Labenz: (30:36) I think you're exactly right. A lot of them take the phenotype input, which is coded to ontologies, sometimes HPO codes, or ICD-9, ICD-10, sometimes SNOMED. There are a bunch of different ontologies. I like HPO the best.
Dr. Catherine Brownstein: (31:01) To be really clear, in those sorts of ontologies something like "no tears" would be a single item.
Nathan Labenz: (31:07) Alacrima.
Dr. Catherine Brownstein: (31:09) And then my condition might be summarized by a set of those. If I had no tears and hair falling out and loose teeth or whatever, that would be like three things.
Nathan Labenz: (31:20) Yeah.
Dr. Catherine Brownstein: (31:20) That would be like, okay, patient presents with this bundle of things.
Nathan Labenz: (31:23) It's a huge field of research too. My friend Melissa Haendel is working with HPO at the Monarch Initiative, being able to map that onto animal phenotypes and making sure it's one-to-one. Humans don't have paws, but the phenotype that is most close to that needs to be translated. And there's also layperson HPO where we're not saying alacrima, but we say "no tears," or "lazy eye" instead of strabismus. There's a whole mess of work that goes into that and making sure that it's accurate and also culturally sensitive. There are many considerations for epilepsy. It's all this stuff that you never think of, and if you don't make those translations then your phenotype is way less accurate than it could be.
That gets incorporated into the model, and then when you input that with the genetics you can have raw data, which is FASTQ files, the zeros and ones that come off the machine. Then you have BAM files, which are the reads of the sequencing itself. I'm not going to explain this very well, but then you have the VCF, which is really processed data. It's basically every single variant, but it's huge. A VCF is a relatively huge file. I mean, it's orders of magnitude smaller than a FASTQ or a BAM, but it's still quite big.
You put FASTQ or BAM or VCF into these pipelines which process the data along with the phenotype, and then it's ordering the variants based on the HPO code related to the gene, related to the variant within that gene, and how likely it is to be causative of disease. The more sophisticated ones can take in these relational things where it's known that this gene binds to this other gene. Gene A is related to the phenotype, gene B isn't yet, but there's a huge variant in gene B, and the patient has the phenotype associated with gene A.
My actual first ever success story was one of those cases where it's called episodic ataxia. Our patient was getting really stiff and couldn't move, would get locked in position. We did sequencing and saw that it was a variant in KCNA1, which wasn't the gene that we were thinking of, but it was related to the gene that we thought it was going to be. And so there, KCNA1 just rose to the absolute top of the list, which was really cool.
Dr. Catherine Brownstein: (34:40) We'll continue our interview in a moment after a word from our sponsors.
That challenge of basically what is the graph of interactions and what affects what in the cell, or at the tissue level or system level, whatever, has been a fascinating area for me recently. I've been really interested to see some new projects. I don't know if you've come across these yet, but there are some that are now trying to predict the evolution of essentially the transcriptome or cell state from kind of one time stamp to the next. And I think that really suggests a major revolution coming soon.
How much would you say... I don't think there's any answer to this because I don't think we know how much we don't know. But when it comes to those interaction type things, my sense has been that we have a relatively small amount of that space illuminated today. Of all the interactions, of all the things where something in this gene, because that interacts with the other thing, could cause a third thing downstream, my sense is that we have a pretty small percentage of those pathways mapped out and well enough understood that we could do this kind of analysis. Is that a good summary, or how would you improve on my summary?
Nathan Labenz: (36:01) No, I think that's totally right. Every time I try to look at the impact of a variant on the protein, I'm surprised at how, well, first of all, user unfriendly a lot of these tools still are. And it's because they're really tough. They're cutting edge and protein folding has come a long way. It's definitely super cool and the people who work on that are totally hardcore. But there's still a lot to be learned and we're still folding certain proteins. We don't have everything worked out yet.
I just keep thinking about the first time we got genome sequencing and how difficult it was to use some of these browsers and they would crash the computer. I think protein folding and some of these tools, like STRING-DB, protein-protein interaction databases, they're amazing, and they're going to continue to get more and more amazing and more useful as time goes on, especially when they get more user friendly for people like me.
Dr. Catherine Brownstein: (37:12) Yeah. It sounds like that might be a real low hanging fruit. This has come up on a couple different episodes where the general observation has been biologists are not programmers, and doctors are not programmers. There's a missing layer that would unlock a lot of value if we could just make it a lot easier for doctors and biologists to use the models and other information tools that have recently been created. But a lot of times those are still kind of put out there in open source project form and they need a UI layer on top or an orchestration layer on top to really make that accessible and useful for a lot more people. So that could be an interesting area for somebody to dig into more.
Nathan Labenz: (37:57) Yeah. And just little things like, I got some sequence back from a new company and they're like, okay, here's the commands to download your data, and I'm like, whoa, whoa, whoa, what? And it was like, and they had no intentions of helping me. Either I learned command line and how to get my data from their server down to mine or I didn't get my data. So I had to have a crash course on getting onto the Harvard, the Boston Children's supercomputer in order to get my data. It was a huge waste of time. I think they're assuming a level of literacy for some of these programs that just, people don't have. You can argue that I should, being in the job that I'm in, but it's hard. It's a learning curve.
And yeah, I think there's a lot of opportunity there for making things a little more friendly. And it goes back again to, you don't know what you don't know. Making your tool accessible to a wider audience, they're going to apply it in ways you never dreamt of. So gatekeeping it to only people who know Unix is kind of tough on everybody.
Dr. Catherine Brownstein: (39:21) Let's circle back to that in a second because this sounds like one of the candidate areas where you might be getting some good value from your O1 Pro grant. These pipelines, are they using any sort of predictive AI technology, like classifiers, things like that, or are they kind of working off a sort of accepted known literature of findings? Because I could imagine, and maybe it varies across provider, but I could imagine one form of pipeline is like, we want to be just really grounded in things that are very well established, and we're going to run down this super long checklist programmatically for you and try to find things that fit. And I could imagine another pipeline that would be like, if these models exist, I'm not sure to what degree they do, you could say, hey, here's my genome, predict and give me guesses. Are there models like that? And I guess, to what degree is this all deterministic versus starting to lean into certain kinds of AI?
Nathan Labenz: (40:22) I think you need both. You need to be confident that you've looked at a genome with all the known things, that nothing funny, very validated best practices. And then you need exploratory pipelines, and that's what we're developing as part of my grant with OpenAI. Really, what's the limit? Where can we take this? Where can we make shortcuts that were before taking a ton of compute and a ton of time? How do we solve cases faster? How do we determine the minimum required data set in order to make a diagnosis? What's the minimum compute necessary in order to get a diagnosis? How do we diagnose new things? How do we come up with new hypotheses faster, all using AI?
Dr. Catherine Brownstein: (41:18) Well, that's probably a perfect tee up for your application of the latest models.
Nathan Labenz: (41:22) Yeah.
Dr. Catherine Brownstein: (41:23) Maybe for calibration there, before we get into workflow specifics, when did large language models start to be useful for you? Is it just with O1 or were you already starting to see some value with earlier versions?
Nathan Labenz: (41:38) Well, we had been using it along with the phenotyping areas more than anything else. I had a PCORI grant working with Ingrid Holm and Melissa Haendel where we were trying to take a patient phenotype and map it to HPO codes, and again, the layperson HPO faster and more accurately. One thing that we did use at one point was working with seven questions, asking which system is affected and drilling down that way, and seeing the ability to get an accurate phenotype through an interaction model and using your own words compared to traditional self-typing surveys and things that are on the web now. We're analyzing that still.
We were using a lot of publicly available tools that I've mentioned before, like AlphaFold and STRING-DB, and a lot of these protein impact prediction models that are required to do our jobs. We need to be able to predict the impact of a variant on a protein. We can't treat it as gospel. People who rely too heavily on these algorithms sometimes get tripped up because some of the known gene-disease relationships wouldn't pass those filters now because there's just something about that gene that you perturb it a tiny little bit and it causes a phenotype that you wouldn't even think, but we know that's true. So if you looked at it at face value, you would have skipped over it.
I think a lot of people are using them and they don't even know they're using them. They don't really know what's behind it. They just know that, oh yeah, you look at CADD score, you look at SIFT, you look at PolyPhen, you look at protein impact, and then that's a cutoff. And then along with allele frequency, but not really realizing that the aggregation of allele frequency is really powered by a lot of these models and just a ton of stuff behind the scenes. If you took it away, we would be struggling.
Dr. Catherine Brownstein: (44:00) Do I have it right then that with an AlphaFold type model, this is sort of after a standard pipeline basically comes back negative, then you would say, okay, let's go into essentially anomaly detection mode for this person's sequence? And you have tools for that as well that can say, hey, look, here's a giant deletion or this gene is stopped prematurely or this one has been copied over a bunch of times, whatever. There are, of course, plenty more different ways things can be weird than those. But you identify those and then you sort of say, I wonder if that maybe is the thing. I will use AlphaFold to take that genetic sequence, see what that protein actually looks like, and then do a structure comparison and see, does that look like that protein is really mangled? And then if so, that becomes a place to go deeper.
Nathan Labenz: (44:54) Yep, exactly. And a lot of that comes with experience too. There are some genes that are really mutated in pretty much everybody. And so if you don't know, you're like, oh, look at that, that's so cool. And then some veteran is going to be like, no, it's not that. It's never that. It's never lupus.
And then you see the gene that you've never seen before and it has a variant in it that's conserved down to zebrafish or C. elegans worms, and then you look at it in AlphaFold and you see that it's royally messing up the protein and you get excited. I mean, it's a roller coaster. A lot of times, even that will fall apart somewhere and then you'll find out that it's only really common in one specific ethnicity that's hardly ever sequenced, but the patient is that rare ethnicity. It goes to show that we need to sequence the whole world in order to understand what is actually disease causing and what is just background variation in isolated other populations.
Dr. Catherine Brownstein: (46:00) Yeah. There's another fork in the road for which question to ask. We'll come back to the data because that is a can't miss area. But just take us a little bit farther down this path of, okay, we have identified some anomalies. Now we run the folding model. We see that the structure looks off. Where do we go from there? What's the next investigation after you've identified that?
Nathan Labenz: (46:28) So back in 2011, you would get really excited about it and want to publish it. But in 2024...
Dr. Catherine Brownstein: (46:36) Waterline is rising always, for sure.
Nathan Labenz: (46:38) Yeah. The waterline is rising, and now people are like, wait a second, that might just be random. So then you want other families or other cases with the same type of thing, variants in the same gene. And there's all these sharing tools to be able to do that. One is called Matchmaker Exchange or Beacon, where you put in the variant and the patient phenotype and you see if anyone else has put in that same gene attached to the same phenotype and you match and then you collaborate. Or somebody's already started a paper with 19 cases of variation in this gene causing intellectual disability, and if you have one, you can add it into that case series and get a better publication out of it that is much more convincing than if you just can publish your one case, which looks pretty cool and you're convinced, but other people might not be by reading it. The bar is continually being raised on this stuff.
Dr. Catherine Brownstein: (47:41) Yeah. So that brings us back to data naturally. How would you characterize the data environment? I was struck in reading through a couple of the papers. I don't have the vocabulary to go as deep as I might wish to on all of your papers, but I was able to see quite clearly that the n is small in a lot of these papers, like single digit numbers of cases. And then I've also kind of noticed a few times you've spoken about the hospital as sort of the data unit, it seems like. And I've heard from a bunch of people actually over time that, yeah, we have this sort of scarcity of data. And I've always kind of wondered, is it a true data scarcity problem, or is it a sort of man-made, for lack of a better term, data scarcity problem that really is more about barriers to access and sharing?
Nathan Labenz: (48:31) It's a tough situation. I don't want to fault the young researcher who doesn't want to share their super cool case because they're hoping they'll find another one and be able to publish it as their finding, not as somebody else's finding in a giant research group facility across the world where they're just going to be a middle author and it's not going to make their career the way that if they held onto it tightly and did everything themselves and got it out there.
But the problem is with that is a lot of times it doesn't work out that way, and then that's not benefiting patients. You're not thinking of the patient, you're thinking of yourself. And it's much better for science, it's much better for the patients in general if everyone shares their data and has it open, and if you see something in someone else's case that you're allowed to match it with that group that's already working on that gene and put it out together. I mean, it's tough.
Children's is really great in that we have this CRDC, this Cohorts Committee, where you can see other investigators' data, patient data, genetic data, not the phenotype, not their name, or nothing identifiable. Sorry, I need to make that extremely clear. But if you have a gene that you're working on, you can put it into the CRDC and come up with all the patients that were seen in the hospital and their genetic variation in that gene, and then the physician, they have a de-identified ID number and you can email the physician to find out more information about that patient.
And I've joined national, international studies that way by having a candidate gene. I go on to what's called GeneDx browser now and query the entire hospital, everyone who's been sequenced and has their data up there, found four other patients, emailed the investigator, and they're like, oh, yeah, this person in The Netherlands is putting together a case series. Emailed them, got my patient's information into that case series, and now it's awesome. They're linked to experts and we're publishing an accurate, comprehensive view of what that condition looks like.
But it's hard. I understand the dilemma, and for the young investigator who really just wants to, they've been working so hard and they want the credit for what they've been working on. They don't want to hand everything over, but it's important that they do. Everyone does.
Dr. Catherine Brownstein: (51:21) You're identifying a barrier to progress here that I had not even considered, which is the investigator holding information more closely than it sounds like they should in some cases. I guess if we were to imagine an ideal data sharing scenario, and exactly how do we square the circle on sharing versus privacy is obviously a tough question. Maybe there's a blockchain based solution that we could imagine, or maybe we just need to change our norms a little bit around how willing we are to share genetic data. I guess maybe you have thoughts on this, but I've always kind of felt like that doesn't seem to me like a huge risk that I'd be taking to share my genetic information with some international pool of information.
But there are multiple different angles here, but I guess I'm kind of wondering, if we were to move from today's data sharing reality to an ideal data sharing reality, how much of a difference would that make for people who have these rare diseases?
Nathan Labenz: (52:32) I'm just spitballing here, but I think it would be huge. I think there's a lot of cohorts in the back of the freezer that just haven't been sequenced and haven't been shared more because of, not apathy, but it's harder to do so. And also sometimes at a very superficial level, it's hard for the investigator to get there mentally doing that. But I think if they did, there would be a lot more discoveries, and there would be a lot more diagnoses for patients, that's for sure.
That's why I always tell patients, or people, if they email me, they're like, okay, my child has this. I'm like, here, enroll in this program and this program and this registry. And they're like, why not just one? I'm like, you want to do as much as possible. Registries are really important because when there's a new discovery, they go straight to the registry to find patients. And that way you're making sure that your sample isn't being left in the back of the freezer, and they'll get to it when they get to it. They're just hitting it from multiple sides, multiple angles.
Dr. Catherine Brownstein: (53:45) Yeah. Is this sort of akin to, I mean, there's a few of these pivot points maybe in the medical system where a lot of data is, of course, locked up in electronic health records. And we sort of have this nominal interoperability requirement that somehow gets cashed out to everything gets faxed around. And it's like, what the hell is that? That seems like not what we intended and yet hasn't been fixed.
And then there's price transparency is another thing that is outside of the scope of this conversation but is definitely the kind of thing people have high hopes for. If you could get a price menu on the wall maybe that would help in certain ways. There's right to try is also a big movement, where people are like, you're not going to let me try this experimental drug even though I'm dying. I should have that right.
This feels like it could be another candidate for similar reform where if I was going to try to whisper into somebody in the new administration's ear, I might say, hey, look at the requirements around sharing of this information. Could we change the defaults here in a way that would move the needle in a big way?
Nathan Labenz: (54:57) It's interesting that you say that. Again, going back to 2011, one thing I was hired for was shifting from being in the biobank, just your samples, discards, tissue, urine, anything that wasn't used that they took from you as an opt out, not an opt in. And I still think it's an opt in come many years later. There's a lot of inertia around this.
To be able to facilitate broad sharing, especially for these tertiary cases where privacy isn't really the number one thing on anyone's mind, it's like moving as rapidly as possible and making as many discoveries as possible in a short amount of time. I really think decreasing the barriers to sharing and right to try...
Tim Yu, who made Milasen is two floors down from where I'm sitting right now, and it's just this incredible story of him seeing an opportunity to make an n-of-1 drug and then just an extremely motivated family breaking down barriers to make it happen. They were so brilliant and motivated and smart and were able to do it. And you just think, okay, if you made the hurdles less extreme, how much more would be possible? That's an incredible story if you don't know it yet.
Dr. Catherine Brownstein: (56:30) Yeah. I don't know it, but here's hoping that we might have fewer of those stories and more healthy defaults going forward. I mean, those stories are inspirational, but they sort of represent a dark matter of probably a hundred other families that just couldn't get over those barriers for some reason.
Nathan Labenz: (56:47) And some things are just so simple and maddening. We have a bunch of cases at the Manton Center where we find the diagnosis and then we need it to be CLIA confirmed. That is, we do stuff in the research realm and then you have to get a new sample and verify it in a specialty lab called a CLIA accredited lab, and then have the finding returned to the family through a genetic counselor or physician.
And sometimes we'll call the physician and they won't play ball with us. They don't care. They don't want to deal with it. They don't see the value or what it's going to change. And in my own family, I haven't been able to CLIA confirm a finding in one of my relatives because the doctor is like, well, I don't have email. Well, what's the value of this? Whatever. And it's just like, oh my god, this is what we're up against.
And then you times that by people not counseling correctly and getting the families into research programs. As much as we're trying as hard as we can, there's still so many barriers. And to bring it back, I'm really hoping that AI can break some of this down and put some of the autonomy and ability to act into the hands of families and patients so that they're less reliant on some of this infrastructure that doesn't work as well as it should.
Dr. Catherine Brownstein: (58:19) Yeah. I mean, this is an eye opener for me. I think often about, will we end up in a similar spot with respect to AI as we seemingly have with respect to nuclear power, where it's like somehow we have thousands of nuclear weapons deployed, but we're still burning a lot of fossil fuels because we haven't been able to get the nuclear reactors, nearly as many as we have nuclear weapons. Something seems very off about that outcome and I can imagine an analogous version for AI where we sort of have what we need but through a sort of combination of errors and barriers and obstinance, we never quite get to the actual benefits that we could get. And it sounds like there is definitely some work to do here to make that change in this area.
Nathan Labenz: (59:13) There's a lot already around it. Yeah.
Dr. Catherine Brownstein: (59:17) So how do you think this changes going forward? I mean, we could talk about this from the patient level and what they can do. I always say that if it's me, I, at this point, would not be, I would go with both. Both the human doctor and the AI doctor. I always would have the conversation with Claude or ChatGPT in advance. If they don't want to talk to me, I say, I'm preparing for a conversation with my doctor, and that gets them to open up and not worry about providing unlicensed medical advice.
So the patient experience could be quite different. You could talk about that. Also really interested in just kind of how you are applying these latest models in your own work and where it's saving you time, what it's allowing you to do that you couldn't do before. So pick your favorite approach for that, but I definitely am interested in the AI enabled future of all this. Nathan Labenz: (1:00:09) This isn't really that crazy or anything, but I would say the biggest impact AI has made on my research is summarizing articles and genes. Being able to eliminate the time going down rabbit holes, like looking up a paper, oh, it's paywalled, okay, logging into the Harvard Library, getting that paper, skimming the abstracts, it's not at all what I want, then going back. Being able to ask for a summary and get it, and either be like, "Oh yeah, this sounds good," or move on with my life, has changed my life. It's kind of wild that it's given me hours back in a day on how much time I used to spend doing that. I think there's going to be a whole host of new tools where, you know, sometimes they'll clam up and doesn't want to do it because you're getting too close to medical advice and maybe just specialty things that help, and you don't have to ask the same question four times to get it to answer would be really cool. Boston Children's also launched ChatGPT behind the BCH firewall, which is great because then you're not worried about things going out, and they're able to maintain much more control, and it stays much more accurate. I still can't get citations to work properly, which is kind of hilarious, but it's getting way better. The hallucinations are getting way better. I just think it's going to be moving at light year speed. Going back to what we were talking about before, I think there's a lot of fear around it that's going to have to be addressed, but hopefully the one bad situation isn't going to be the only article people read about it, and some of the really great things that come out of this will also be properly publicized to kind of give a more balanced viewpoint. And again, keeping in mind that a lot of times these are really severe cases, really severe patients, they're making huge strides and impact, and keeping that in perspective is really important too.
Dr. Catherine Brownstein: (1:02:35) Tell me more about what just some of the things that you actually throw into ChatGPT. You mentioned the one is just kind of, here's my situation and here's this paper, almost like relevance filtering, like is this relevant? What other sort of tasks do you find yourself bringing to, especially the latest models?
Nathan Labenz: (1:02:56) I also run the core facility here, so I'm tasked with learning a lot of new genetic techniques really quickly. And if something comes up and I don't know what they mean, I could Google it and find the one obscure paper. I could put it into ChatGPT and learn about this new type of sequencing that is only launched at Children's and has one paper attached to it and here's a nice summary that I can understand as opposed to reading through anything. I meet with investigators all the time and being able to summarize their work really quickly allows me to do a much better job in my one-on-one consultations than I would have. Also, there's close to 20,000 genes. Anytime I get a case with, okay, they think it's this, sometimes I know what that is, other times I do not. And I am able to print out a summary of the condition really, really quickly and nicely and also get the latest on it, also who's working on it, and just go into a meeting much more prepared and much less time. Also, when you get a paper back, a lot of times, for some reason it all seems to be reviewer number two, is like, "There's a whole body of literature on this," and you don't really know what they're talking about and being able to address some of the critiques and kind of put them into context and what they're doing. I mean, it's all cutting down on this mundane, time-consuming, really tedious part of the job, and getting back to the fun part, which is gene discovery and going through a list of 20 possible candidates and narrowing it down to 3 that you're going to present in, like, an hour and a half. True story.
Dr. Catherine Brownstein: (1:05:01) Let's do that true story maybe in more depth. I mean, is that another thing where you're using ChatGPT to help?
Nathan Labenz: (1:05:10) Yeah. I mean, why not? If you have 20 genes and you have the phenotype and they all seem pretty interesting, you can go through and look at the protein impact, so like order a CAD scores or conservation and be able to order it that way, but then doing a really quick relevancy assessment using ChatGPT saves a lot of time.
Dr. Catherine Brownstein: (1:05:41) So how do you set that up? Do you have a prompt template that you go back to over and over again? How much have you had to develop that? How much do you have to give in terms of detailed instructions or examples? You know, we're getting into the nitty gritty here, but this is the part where I think both people hopefully can learn from your experience. And if nothing else, just demonstrating that this is possible, I think, is quite useful because there's just so many people, including software developers. You'd be amazed, maybe you may have seen this, but you'd be amazed by how many software developers tried GitHub Copilot 18 months ago, you know, when it first came out with a 3.5 model behind it and were like, "It wasn't that good, it can't help me." And so I think there's just a lot of value in sort of object lessons of, like, here's hard work that highly skilled, highly educated professionals are doing that ChatGPT or, obviously, other models perhaps similarly, but we're focused on ChatGPT in this case, can really help with. So, yeah, I'd love just kind of as much detail as you can get into in terms of how you actually go about setting these things up, how you've iterated on them, etc.
Nathan Labenz: (1:06:57) Okay, so for an example, I work on bladder pain, undiagnosed bladder pain in individuals. It's really severe. Sometimes they can't leave their house. It's called interstitial cystitis bladder pain syndrome. And there's no real gene attached to it. We've found a couple where it seems like there's way more variation in that gene than you would expect given a general population. And so it's a candidate gene. It's in no way like a slam dunk, but I have around 500 patients in a cohort with this. And then we have done, I've done in conjunction with Josh Modelau at Columbia and Aleph Garavy, assessments of my cohort and other cohorts, what genes have way more variation in them than you would expect. And you can come up with lists, and then you can also come up with gene pathways, like multiple genes. And then these pathways are interesting because a lot of times they have a label, like the small molecule transport pathway. There's like 12 genes in it. And then you want to know, are any of these genes tied to bladder pain? Are they tied to the bladder? Are they tied to bladder cancer? Are they tied to anything? And being able to ask those questions really quickly, and then sometimes a simple yes or no and just putting them in in a string and then coming out with yes or no, it saves a huge amount of time. And then the ones that are yeses, then you can drill in. And then I always check the nos too just in case. But I was doing that last night and was able to just get through these pathway lists and be like, "All right, this one has 60% of the genes that have a tie to bladder cancer," specifically bladder cancer, which means they're expressed in the bladder, and there's known perturbations that cause bladder dysmorphology or bladder conditions. So this is more interesting than anything else. One, I was actually, I almost screamed because the gene was linked to urothelial issues, which is a great mechanism of disease, and I'm going to definitely follow up on that. I have a meeting tomorrow morning to discuss it. So it really just helps. I imagine I only started working in genetics after the genome was published, so I don't know how people did it beforehand, and I think there's going to be this whole generation of geneticists that aren't going to know how things were done before all this was available, because it's going to be a huge game changer and time saver.
Dr. Catherine Brownstein: (1:09:43) So how much difference would you say you see between, for example, GPT-4o, o1, o1 Pro when you bring those kinds of questions? Because I can see sort of interesting different trade-offs. Right? ChatGPT today, if I recall correctly, maybe they've just updated this, but certainly with GPT-4o, you can enable web search. With o1 Pro, search is unavailable. That's what I thought, and that is still the case. So you have this sort of GPT-4o could go out online, find information that's maybe more recent than the knowledge cutoff, which could be really useful, but isn't going to reason about it in the same way. o1, o1 Pro, you have this more reasoning, but you have knowledge cutoff issues, inability to go out and supplement at runtime. Do you have sort of a taxonomy of what models you use for what things and how you know when to trust what it's saying versus when you need to fact check, and how much is the reasoning adding over the 4o for your purposes?
Nathan Labenz: (1:10:53) I think I'm becoming more and more convinced over time that this is going to revolutionize things. I was skeptical at first. I was like, oh, we're going to have to check every single thing. Is this actually saving any time? And it's just getting more and more accurate. The reasoning is getting better. And sometimes you'll be just so pleasantly surprised. You'll ask it a question. They'll say, "Okay, answering it in the form of a genetic counselor is this," and then completely surprise you and be like, "Another way to look at it is this." And it's doing an amazing job. I know I am a convert, and an early adopter relatively. But, you know, I think the sky's the limit, really. And we're going to get to a place where, you know, it's going to be solving cases and shortening the diagnostic odyssey and democratizing access to genetics interpretations and, you know, sidestepping a lot of the barriers that we have right now. It just needs to convince everyone that it's accurate and the reasoning is good and over a percentage of time. It's kind of hypocritical in a way that I think we're going to have a higher bar for it than we do ourselves. We can say, "Oh, sorry, I missed it, I shouldn't have," and we're not going to forgive it if it misses something. But I guess that's the way it should be.
Dr. Catherine Brownstein: (1:12:36) Yeah. I'm not sure if that's the way it should be. It does seem like it's the way it is. In self-driving, my general working assumption is that it's going to have to be 10 times safer or have one-tenth, you know, the danger rate to be acceptable to people. And I would guess, you know, probably something similar will happen here at least when it comes to actually putting it, you know, in a more forward-facing role where patients could access these sorts of things themselves. If it's a tool for the professionals, then we maybe are a little bit more, you know, put the responsibility on the professional and can use them earlier. But, yeah, I would probably advocate for going for it before it gets to 10x better, but, nevertheless, that does seem like the sort of mentality that we have. So just honestly, for me, maybe for the audience, but for my benefit, how are you managing those trade-offs between needing to go out and search? Because if you wanted to use o1 Pro, you'd have to go do your own search, copy, paste in, let it do its thing. GPT-4o can do its own web search. So in the very nitty gritty, what model do you go to and how do you set it up for success?
Nathan Labenz: (1:13:51) I'm not using the web search right now. I'm more using o1, I think. I mean, I'm playing with 4o. It's moving so quickly that there's no sophisticated reason for that. It's just kind of like what I'm comfortable with and then moving from there. And I'm really impressed with 4o reasoning. I think web search still kind of scares me a little bit just because there's a lot of garbage on the web, and I have to really be confident in any answer I'm getting out. I think checking everything is still paramount here, but hopefully it won't be that way in the near future.
Dr. Catherine Brownstein: (1:14:40) How long does it tend to think on the questions that you're giving?
Nathan Labenz: (1:14:44) At first I think it's shortening too, like by the day. At first I remember it would just be kind of hanging there for a while. I'm like, "You okay?" And now it's just really fast.
Dr. Catherine Brownstein: (1:14:58) So like under a minute in most cases, it sounds like.
Nathan Labenz: (1:15:01) Also, I think I'm getting better at the prompts. As you said, you have to learn how to ask it stuff too for it to come out with the right answer right away.
Dr. Catherine Brownstein: (1:15:13) Yeah. I would love to, you know, we'll trade. One thing that I imagine you've probably also found, but I've definitely found even in low-stakes situations is I try to be really neutral in the way that I ask questions because I find one of the most common failure modes, at least from what I experience, is the model running with a preconception, which might have been a misconception on my part, and kind of mirroring that back to me. Now I'm not doing genetic analysis, but even in terms of how to solve a programming problem or how I should think about architecting my application or whatever, a lot of times, if I give it a sense of where I'm leaning, it will lean that direction too, perhaps without good reason. So that's one. What else have you found to be important in prompting?
Nathan Labenz: (1:16:02) I've found that it actually thinks a little too much where I'm like, "Is this gene related to this phenotype?" And I'll bring up a study and I'll look at the study and it's the gene related to it, like is this a step too far down the chain, which is really impressive because it made that intellectual leap, but I need something simpler. I need a paper that's just linking that gene to that phenotype. I was like, no, too far, too far. I would be like, how to ask it so it's not thinking as much, and it's just...
Dr. Catherine Brownstein: (1:16:43) And when you describe that, it's bringing up a study out of its pre-training knowledge. You give it a question, it says, "Oh, so-and-so et al. found this," and then it sounds like it is also marshaling knowledge of the graph of interactions and sort of saying, "Well, this paper showed this and then I know..."
Nathan Labenz: (1:17:06) Your mother this. And I'm like, "I'm not making that case in the paper. I just want you to say it's upregulated in cancer. That's all I want." And that was actually kind of wild because then you're like, okay, it's thinking, it's really thinking and making conclusions.
Dr. Catherine Brownstein: (1:17:29) That is quite interesting. Do you think that those things are real and we're just not there yet, or is it going off in a direction that is just, you think, just fundamentally not super productive when it does that?
Nathan Labenz: (1:17:42) That's the million-dollar question. I don't know. Maybe I'm not smart enough to understand it, and it's right. I don't know.
Dr. Catherine Brownstein: (1:17:51) Do we find out?
Nathan Labenz: (1:17:52) We just need to keep playing with it and keep working with it and keep using it.
Dr. Catherine Brownstein: (1:17:58) This sort of is like the Move 37 equivalent in... Of course, you're familiar with the AlphaGo championship from years ago where Move 37 is AI shorthand for an output from an AI system that is very surprising to the human experts and nevertheless proves to be, you know, a genius move. It was one of these moves where it was like, "Oh, wow. This thing is playing Go in a way that we never thought Go could or should be played, and we actually have something to learn from this system." They initially thought it made a mistake, and then it turned out it was a genius move. Do you think is there, it sounds like you at least have some allowance for the possibility that some of these weird analyses you're getting back might be Move 37 brilliance, but we just don't have easy ways to even resolve yet whether it's going in the right direction or not?
Nathan Labenz: (1:18:47) I have faith in it. I think that we just have to keep an open mind and again just keep playing with it and see what it can do.
Dr. Catherine Brownstein: (1:18:56) How much data do you, do you actually throw whole cases in?
Nathan Labenz: (1:19:01) It's kind of too hard to do that right now. I'm not throwing in a medical record even if it's behind the firewall. I'm just summarizing. We're building up to see the limits. So that's actually part of the grant that I have with OpenAI is to see how far we can take this and how much it can handle and how much it can replace me.
Dr. Catherine Brownstein: (1:19:23) And the barrier to doing that right now is the context? I could imagine multiple different reasons that it might not work to just take the, you know, simplest thing I would try, but then I'm sure there's going to be a barrier. But if I just said, "Okay, here's my whole medical record, and here's my whole genetic summary of all the strange variations," whatever, maybe take the top, you know, chunk of that file, copy, paste, "Analyze this." What makes that not viable today?
Nathan Labenz: (1:19:52) I mean, the amount of compute needed for that and then also the opportunity for tangents and the utility. Not everyone who smokes gets cancer. Have to figure out what we're asking and what's relevant and what's meaningful and what's a good use of resources. I think there's, forgetting that every question takes energy. So we can't just throw everyone's medical record in there and everyone's genome and see what comes out. We have to be kind of thoughtful about it and see in the cases where it does have utility and what information is necessary. Also, though, to your point, I think it'll be really interesting about trajectory and predictions and all the medical record mining that we're doing now. Can it do it on steroids? And come up with predictive models and interject like, "Okay, I know normally you would want to see a colonoscopy at 40, maybe you need one at 24," just based on genetics and everything that it's able to see that we're not smart enough to see yet. I think the possibilities are really, it's in a really exciting time.
Dr. Catherine Brownstein: (1:21:15) Yeah. I mean, Sam Altman has recently said that they're losing money on the o1 Pro subscriptions even at $200 a month. So it sounds like people are maybe not in general being so conscious of the compute that they're consuming and just kind of throwing a lot at it. The budget's got to be pretty high. Right?
Nathan Labenz: (1:21:35) Yeah.
Dr. Catherine Brownstein: (1:21:36) I mean in terms of from the medical system, like what people would pay, willingness to pay or what insurance is prepared to pay compared to a $200 a month o1 Pro subscription. I assume, you know, that looks very cheap, right, by comparison to hiring professionals and teams of people like yourself.
Nathan Labenz: (1:21:54) Yeah. Yeah. And people like using it. I mean, well, everyone I know loves using it. I don't know if that's a random sample.
Dr. Catherine Brownstein: (1:22:03) So what else is going on with this grant and your relationship with OpenAI? Are you working with them closely and iterating on use cases and giving them feedback, or what is the dynamic there?
Nathan Labenz: (1:22:14) Yeah. Exactly. They've been just wonderful and super cool, and it's fun meeting, working with smart motivated people. I get emails at 11 at night on a weekend. They're working hard.
Dr. Catherine Brownstein: (1:22:27) They work hard, yeah, there's no doubt about that.
Nathan Labenz: (1:22:30) We're just getting back into it after the holiday. Hopefully next, in a few months I'll have something really exciting to talk about. I mean, I'm just blown away that they're so forward-thinking and able to support this type of work.
Dr. Catherine Brownstein: (1:22:46) It's happened a lot faster than I would have thought even just a couple years ago. That's for sure. Maybe in terms of wrapping up, if I try to summarize everything here, we have a data sharing problem. We have a limited capacity for analysis problem as humans. And one of those is going to require a non-AI solution. The other one, AI is increasingly ready and able to do a lot of analysis. But then you still have a number of kind of practical issues around, well, there's the knowledge cutoff and the search and you do your own kind of curating the search and you can't throw everything into it because maybe it's a little too big for the context window and you don't have all the workflows you might like because it's all in a browser and so you kind of have to paste stuff in and get stuff back and it seems like it's still fairly manual. Could you, this is almost like your OpenAI customer interview, but what do you imagine the experience being, say, a year from now maybe when we refine a little bit more and do a lot more integration of these systems? What do you think that could be for you and for patients?
Nathan Labenz: (1:24:02) That's a great question. I think they might wrap things so it's less free-form, or be able to guide patients, make it user-friendly with a point. You're not just with a prompt and then you're meant to come up with the correct question to get it to answer what you're thinking about. I think that's pretty low-hanging fruit and will be very useful for patients. I think for researchers, also the prompt engineering and use cases, I think we're still figuring out, at least at our institution, where it will be most impactful and what people want to use it for, so that's going to be clearer in a year. Just getting people in the door and communicating, and we have surveys like, "Okay, what would you use it for? What have you been using it for?" And clearing that up and then making that better. I'm trying to be realistic here. I mean, I would love to say that we're using it to solve cases. I don't know if that'll be true, but I hope it is. I think it's just going to be more intertwined in our day-to-day existence. What do you think?
Dr. Catherine Brownstein: (1:25:21) All bets are off. I don't know. I mean, o3, that's another question I had. Are you on the review team for o3 at this point? If not, I assume it'll be coming your way before too long.
Nathan Labenz: (1:25:32) Hope so.
Dr. Catherine Brownstein: (1:25:34) Yeah. I mean, it looks like that is another significant step up in just raw reasoning ability. The math, the frontier math results in particular, everybody was kind of citing the 25 percent success rate, but that's the very high compute level that costs maybe thousands of dollars a problem or whatever. But it was also really notable that the low compute setting was still like 10%, which was like 5 times better than anything that had come before, which is maxed out at 2%. So the o3 series is going to be another pretty serious step change in terms of just how hard of a problem these things can solve. We might start to get into context window limits being binding if there's just too much information in a medical history or in a genetic file. But I suspect that those can both be filtered and kind of summarized and, you know, boiled down to what matters most such that even in a couple hundred thousand tokens, which is what they currently have, I would honestly take the we probably will be solving cases with an o3 in a year's time side of that bet or for that matter, an o4 because, you know, the gap in time between an o1 and o3 was so small. And the signals that we're getting are like, we don't really see this slowing down. There's going to be more progress on this front. So I'm always kind of wondering what's missing, and increasingly, it's harder and harder to look or harder and harder to find, you know, and or pinpoint the things that are really missing. And that doesn't mean there's nothing missing. I'm sure there's still some things, but it is increasingly hard to say what they are. And my best guess would be that you'd probably see at least some cases that you could just throw in to an o3 and get meaningful, you know, insightful conclusions back in a year. Yeah. Maybe we should get together again in a year and review the progress.
Nathan Labenz: (1:27:36) We'd love that.
Dr. Catherine Brownstein: (1:27:39) Cool. Well, anything else on your mind today? I really appreciate the introduction to all your work and how you're using AI in it, but anything else on your mind before we break?
Nathan Labenz: (1:27:48) Just thank you so much. I think next year this time might be kind of different. Hopefully.
Dr. Catherine Brownstein: (1:27:56) That seems to be the new normal. Change is the only thing we can really count on. Well, I'll look forward to putting on my calendar now to get back together in a year and see where we're at. But for now, Katherine Brownstein, Miles per hour PhD, assistant professor at Boston Children's Hospital and Harvard Medical School, and a recent recipient of the ChatGPT Pro grant. Thank you for being part of the Cognitive Revolution.
Nathan Labenz: (1:28:16) Thank you. Dr. Catherine Brownstein: (1:28:18) It is both energizing and enlightening to hear why people listen and learn what they value about the show. So please don't hesitate to reach out via email at tcr@turpentine.co, or you can DM me on the social media platform of your choice.