AI and the Practice of Law: from CaseText to CoCounsel, with Pablo Arredondo, VP of CoCounsel

Nathan sits down with Pablo Arredondo. They discuss the evolution of legal research, how GPT-4 has changed legal practices, security and privacy in AI driven legal services, and more.

AI and the Practice of Law: from CaseText to CoCounsel, with Pablo Arredondo, VP of CoCounsel

Watch Episode Here


Video Description

In this episode, Nathan sits down with Pablo Arredondo, VP of CoCounsel for Thomson Reuters. They discuss the evolution of legal research, how GPT-4 has changed legal practices, security and privacy in AI driven legal services, and more. Try the Brave search API for free for up to 2000 queries per month at https://brave.com/api

X/SOCIAL:
@labenz (Nathan)
@tweetatpablo (Pablo)
@casetext

LINKS:
Casetext: https://casetext.com/
Cognitive Revolution (new feed): https://cognitiverevolution.ai/

SPONSORS:

Oracle Cloud Infrastructure (OCI) is a single platform for your infrastructure, database, application development, and AI needs. OCI has four to eight times the bandwidth of other clouds; offers one consistent price, instead of...does data better than Oracle. If you want to do more and spend less, take a free test drive of OCI at https://oracle.com/cognitive

Omneky is an omnichannel creative generation platform that lets you launch hundreds of thousands of ad iterations that actually work customized across all platforms, with a click of a button. Omneky combines generative AI and real-time advertising data. Mention "Cog Rev" for 10% off www.omneky.com

The Brave search API can be used to assemble a data set to train your AI models and help with retrieval augmentation at the time of inference. All while remaining affordable with developer first pricing, integrating the Brave search API into your workflow translates to more ethical data sourcing and more human representative data sets. Try the Brave search API for free for up to 2000 queries per month at https://brave.com/api

ODF is where top founders get their start. Apply to join the next cohort and go from idea to conviction-fast. ODF has helped over 1000 companies like Traba, Levels and Finch get their start. Is it your turn? Go to http://beondeck.com/revolution to learn more.

TIMESTAMPS
(00:00:00) Episode preview
(00:01:42) The evolution of legal research: From books to AI
(00:02:19) The game-changing impact of GPT-4 on legal practices
(00:04:01) Exploring the future of legal AI with Pablo Arredondo
(00:04:34) The birth and evolution of CaseText
(00:07:06) Revolutionizing legal research with AI and large language models
(00:14:42) The leap to GPT-4: A new era for legal tech
(00:16:42) Navigating the challenges and opportunities of AI in law
(00:27:41) The continuous evolution of AI in legal tech
(00:43:46) The importance of rigorous testing in AI deployment
(00:45:19) The importance of testing in legal tech
(00:45:46) Merging with Thomson Reuters: A new era of quality control
(00:46:49) Harnessing GPT-4 for legal document review
(00:48:27) Innovations in legal tech: Beyond GPT-4
(00:49:00) The evolution of legal research and document management
(00:49:54) Exploring the potential of AI in legal firms
(00:51:37) Security and privacy in AI-driven legal services
(00:52:43) The future of vector databases in legal tech
(00:54:10) AI-assisted legal assistance: A new frontier
(00:58:41) The pricing strategy and market response to AI legal tools
(01:10:18) The future of law and AI integration
(01:23:18) Reflections on being acquired by Thomson Reuters
(01:26:07) Closing thoughts: The impact of LLMs on justice


Full Transcript

Transcript

Pablo Arredondo: (0:00)

The first flickers of the power of this technology that I saw was suddenly the computer flagging instances. And I was like, wait, how does it know that that means overruled? Because that is not a normal way to say it. And it was because we had moved beyond literal keywords. My cofounder Jake and I were shown a demo of GPT-4. And within 12 hours, we had pivoted the entire company around it. A lot of times you'll hear people say all that will be automated is the menial work that nobody wants to do anyway. But they can also help for things that are deeply substantive, finding inconsistencies so you can go do the famous cross-examination that destroys the witness.

Nathan Labenz: (0:40)

Hello, and welcome back to Turpentine AI. If you're looking for the Cognitive Revolution, don't worry. It's not you, it's us. Turpentine is developing fresh new AI-focused shows, and this feed is set to become a best-of show featuring highlights from multiple sources. Meanwhile, we've created a new feed dedicated to the Cognitive Revolution, which you can find and subscribe to at our website, cognitiverevolution.ai. Just yesterday, we posted an interview with Andrew Lee, founder and CEO of email client Shortwave, which has the single best AI email assistant that I've personally used and has become the first email client to effectively replace the Gmail web app as my go-to email experience. Definitely take a minute to visit cognitiverevolution.ai to subscribe for that and plenty more original content, which will only be available on the new feed. Today, exclusively on this feed, we're looking at how AI is impacting the practice of law with Pablo Arredondo, who helped create and drive the adoption of AI and other advanced legal search tools as cofounder of Casetext, and today is doing the same with large language models as VP of CoCounsel at Thomson Reuters, which acquired Casetext last year. We begin the conversation with a historical overview of legal research from its pre-digital citation-based origins to the computerized but still fundamentally keyword-driven search era through the last 10 years of Casetext and the AI-powered innovations that allowed attorneys to search no longer just by keyword but by meaning, dramatically improving their ability to locate relevant case law. And finally, on to the present day in which large language models are beginning to fundamentally change how lawyers perform an ever-growing range of high-value tasks, including document review and deposition prep, contract review, and more. We talk about Pablo and team's first exposure to GPT-4, how they immediately pivoted the company to take advantage of this new technology, and how they've designed their product, and perhaps more importantly, their product development and quality assurance processes with reliability in mind. We also discuss why CoCounsel's price point, which is a couple hundred dollars per attorney per month, is not really a problem given the high-value use case that they serve and why Pablo has remained a GPT-4 maximalist, at least as of the time that we recorded a couple weeks ago just before Claude 3 was released. Pablo also shares his thoughts on the future of legal billing, the potential for AI-powered arbitration, and the evolving regulatory landscape governing the use of AI in the legal profession. While he is, as you'd expect from a VP at a major company like Thomson Reuters, extremely focused on responsible development and deployment, his hope is that large language models can make legal services faster, less expensive, more accessible, and higher quality for all. As always, we appreciate it when you take a moment to share the show with your friends. This episode would be an obvious fit for the lawyers in your life. And, again, please do make sure to subscribe to our new feed, which you can find at cognitiverevolution.ai. We'll have original content both here and there over the coming weeks, so you'll definitely want to stay tuned to both feeds. Now here's my conversation with Pablo Arredondo, cofounder of Casetext and VP of CoCounsel at Thomson Reuters.

Nathan Labenz: (4:09)

Pablo Arredondo, cofounder of Casetext and VP of CoCounsel. Welcome to the Cognitive Revolution.

Pablo Arredondo: (4:15)

Thank you so much for having me.

Nathan Labenz: (4:17)

I'm excited about this. We've got what I think is going to be a really interesting and hopefully a good mix of fast-paced and deep diving into all things legal AI. You've had a real front-row seat in this business over the last 10 years since you founded Casetext. I would love to start if you would give us a backstory, just a brief history of the application of maybe technology more broadly, but obviously especially emphasis on AI in the legal profession. I realized that while I've been into the product and played with it today, and we'll get into that in a lot more detail, I don't really know that much about how things were before. I know the Abe Lincoln story of he had to walk a long way, I think, to borrow books and return them. I mean, that's kind of all I know.

Pablo Arredondo: (5:03)

Where are we coming from in legal research, and where have you been over the last 10 years with Casetext? Then you could take us all the way up to the present.

Nathan Labenz: (5:11)

Yeah, absolutely. So law is sort of an interesting area because you had some things in law much sooner than you had them in other places. Like a citation graph, things citing to other things, like what we now think of as hyperlinks on the Internet. Not many domains had that in the same sort of rigorous systematic way that law had because you were constantly citing to earlier opinions and citing specific pages in earlier opinions, and it was always sort of building on itself. And very quickly, especially in America, which had an explosion of lawyers and litigation, there were quickly more judicial opinions than anyone could read in their lifetime. And so you immediately started to have this information retrieval problem and challenges. How do you find the right cases? And not just finding the right cases, how do you know if the case you're looking at is still good law? One of the very earliest legal innovations was a guy named Simon Greenleaf. This was so long ago. So he was in the town of Gray in the territory of Maine. So it wasn't a state yet. And it was so long ago that there wasn't any American case law to rely on, so he decided to cite British case law because that's what we had to cite to. And he cited a case that had actually been overruled by subsequent courts. Essentially, the courts had later said that's no longer good law. And the judge threw him around the courtroom, and he was embarrassed. And he came back, I'm taking a little artistic license here, but he was embarrassed and said, I never want to feel embarrassed like this again, but how do I know what cases have been overruled? I can't read every case and know them. And so he started to make a list of these cases have been overruled by this case. And so this, I think in the late 1800s, was the early example of a legal tool that then grew and grew and became this resource, this sort of meta resource that lawyers could use to just better navigate the law and better represent their clients. Later, we're going to see that our first use of large language models at Casetext was actually because we were putting our shoulder to exactly that challenge that was there in the late 1800s. But I think a couple of things in between that I think are worth noting. It used to be that case law, at first, judges literally weren't even writing things down. It would sort of be like, hey, I think I remember Fred said something about that. That's how common law was handed down. Then they started writing it down in very hodgepodge ways. You'd have crazy things like an almanac, and the guy doing the almanac would also write reports. You'd see like, and then Lord Judge Mansfield held that, and by the way, I think we'll have a great crop of corn. Just a complete mess until you had this West Publishing and this guy Jonathan West, who is now Thomson Reuters is sort of the continuation of all of this, who first systematized the case law into these reporters. And so whenever you see a lawyer in his commercial or on TV and he's got those books behind him, those are all the systematically, let's take all the case law, let's put it into one form, into one series of books, and start to make it much more navigable. But of course, this was long before computers. And so the question now is how do I find the right case? And so you had this taxonomy that was created by humans who went methodically and said, okay, we're going to divide up law into all these different areas, and we'll create this flow, this taxonomy. I'm looking for torts. Okay, somebody's been injured. Now I'm looking for animal attack. Now I'm looking for dog. And this was quite useful because it could help you find what you needed, but it was also a prison of sorts. Because however they divided up the law, that was what the law was. You had to stick to whatever framework they had. And so that was the governing paradigm for legal informatics for a while until the, I think, very late sixties and early seventies when you started to see the digitization of case law. And so there you had another company now that is Lexis. This actually came out of, I think, a group in Ohio that started with just the Department of Agriculture. I'm blanking on the specifics, but essentially a small limited project became then, suddenly, you now had everything digitized. And so this was a huge step forward in some ways, but also brought in other challenges. Now I could navigate the millions of opinions just by searching a keyword. It's whatever keyword I want. And then you've got into Boolean searching, patent within sentence of computer. And so you could now search using keywords, but as we all know, keywords are quite limited. They're very literal. And you had issues of both precision, which is to say things that came back just happened to have that word but weren't what you cared about, and you had more insidious issues of recall. There were things you did want to see, but you don't see because it happened to use different language. And that was really the paradigm through when I was practicing. I was a patent lawyer at Kirkland and Ellis. That's what surrounded me during my time practicing law. So now we're starting to get closer to Casetext in these new systems. And I'm going to be nerding out on this because I'm assuming your audience doesn't mind nerding. I hope that's okay.

Nathan Labenz: (10:15)

Yeah. No, we're here for it. People want the nuggets. A lot of them are building their own tools, obviously, in many different areas. So they want to learn from the depths of your experience in particular.

Pablo Arredondo: (10:27)

Alright. Yeah. And I mean, the early stuff that we were doing with Casetext, so one of the issues we had was, remember I talked about knowing does this case overrule this case? That became this very important tool called the citator. The famous one was called Shepard's, and then that got bought by Lexis. Thomson Reuters has theirs called KeyCite. Really essential systems. But the issue with both of them at the time that Casetext got started is that they were only looking at the direct citation path. You could only see cases that directly cited to your case to get a sense for that area of law and how things are being treated. And the analogy I make is imagine you went to a video store. In fact, some of you have never heard of these, but there used to be a time when you'd go in to rent movies. And you asked the clerk, I really am interested in, I love the Godfather. Can you recommend any other movies? And imagine the clerk said, Godfather Part II, Godfather Part III. That's the end of the list. And you'd say, well, wait a minute. That's a pretty impoverished list based on that. What about Goodfellas? What about MASH? What about all these other movies? And so one of the early things we did at Casetext, this is last decade, was exploit the same patterns that you had been seeing in Spotify and Amazon, which are these soft citation relationships. When Spotify recommends a song, it's not because that song literally references this other song that you liked. It's that people who download this song tend to also download that song. And so that sort of soft citation relationship was this big blind spot. And our first commercial product was you could take the brief, the lengthy document that lawyers use when they're trying to persuade the court, and we would analyze all the cited cases you did cite and then run a much more robust citation analysis. And then we could suggest cases that you had overlooked, cases that weren't in the brief but that you should have read. These early moments where at the top firms, firms that have, no money is not an object relative to the stakes of the litigation. I mean, they buy all the tools that they thought could help their clients. They were saying, how were we missing this case? How did we miss this case? We have the best attorneys, et cetera. And it was just because the technology they were using had this blind spot. So we started then selling these specific technologies to the top firms, affluent firms. But at the same time, we found that there were also a lot of attorneys who couldn't afford the best, the Westlaws. And they were relying on Google Scholar. They were relying on tools, frankly, that could have been a lot better. And so we then decided we wanted to build out a full-fledged research system that they could use for all of the different things you need to do for research. And so we ended up in this sort of bimodal place. There's this great Jack Daniel's commercial where it opens with them serving Jack Daniel's at the Ritz at some wedding, and then it shows a biker bar with all the bikers, and they're also doing Jack Daniel's. And it says, Jack Daniel's served in fine establishments and questionable joints since 1870 or whatever. So Casetext sort of was this thing where we were at the Manhattan skyscraper firms, representing, antitrust eight-figure litigations. But also increasingly, you could find us in the strip malls of Pasadena or wherever, solos were hanging their shingle, also doing very important work, of course. Because it's also very important that folks who don't have deep pockets have representation. Okay. In creating that full-fledged research engine, we needed to do exactly what that lawyer from the 1800s, Simon Greenleaf, did. We had to know, does this case overrule this case? Because then we can warn the attorney when they're reading it. And sometimes the court will be very explicit. They'll say, we overruled Jenkins. Great. Easy to parse. But sometimes the court will say, we regretfully consign Jenkins to the dustbin of oblivion because judges can say it however they want, and sometimes they like to get poetic. And so now we have this really profound information challenge. How do you get a computer to detect that kind of overruling, that kind of treatment where it's not using the normal words? And to be clear, this was part of a process that involved humans. It was very much a human in the loop, but how do you triage? How do you identify? Humans should take a look at this. Okay. So in trying to do this, we're using old techniques and then, rejoice, here comes BERT. Here comes large language models. And our first experience, the first flickers of the power of this technology that I saw, was suddenly the computer flagging instances. And I was like, wait. How does it know that that means overruled? Because that is not a normal way to say it. And it was because we had moved beyond literal keywords because this language model was starting to actually understand. You can, there's a whole debate about understand versus not. And truthfully, when we're talking about things like BERT, I think it was more just that we had encoded language in such a way that it would draw in these examples. And so that was the beginning, I think 2018-ish, very early on of these large language models.

Nathan Labenz: (15:29)

Hey, we'll continue our interview in a moment after a word from our sponsors.

Nathan Labenz: (15:33)

Quick interjection because I don't know the history on BERT. Did they open source it for you, or were you just able to, so you were able to get a trained BERT to work from from the beginning?

Pablo Arredondo: (15:47)

Yes. And this is one of these great moments that really shifted, I think, society in some ways. And unfortunately, and this is probably, there's better folks than myself that talk about that aspect of things, but less and less, I think, are people putting out there in open sourcing, at least certain aspects of it. I know there's a lot of open source movement in AI, but I don't think Google's open sourcing Gemini. Let's put it that way. But yes, they put out BERT as a paper, Devlin. The code was there. The techniques were sort of just, it was all, you can go do it. And what we did is we actually took that same approach and just trained it on US common law. But basically took directly from Google the techniques, the approach, and then, frankly, I think some of the code as well to run it. And that was just an incredible moment for us. And everything we're doing now is really sort of just this evolution. Sometimes punctuated equilibrium. It's not just linear. It's like, woah. But very much in sort of the same path that started with BERT. And so once we saw that it could do this for detecting overruling treatment, we said, okay, wait a minute. Now we can do this for search. And so we created this engine called Parallel Search where an attorney could just enter a full sentence. It could take a sentence from your brief, and it would find relevant case law even if there was no overlap in the keywords at all. And we had a demo sentence we did about, we use Cyrus as the name, somebody being fired for not wearing a mask at work, and suddenly it was finding cases where people weren't putting on safety helmets. It was finding analogous case law. Things about sartorial safety. And for lawyers who had basically been in what I theatrically call the keyword prison, this was as seismic an event as you can imagine. At the time, I thought you will never get more seismic than this. Boy, was I wrong, and we'll get to 2022 in a second. And so we released this technology through Parallel Search, and it just immediately was a huge hit. Courts were using it. Firms big and small. And by adoption, I just mean the biggest 200 firms. Getting lawyers to take time to adopt new technology, it's a hard fight. They're busy folks. They're risk-averse, et cetera. Parallel Search was this example of one that you just saw kind of spreading, and it was absolutely beautiful. But it was because so often lawyers are trying to find something, but they don't know the exact words. And so we did it for case law first, then we expanded it so that you could upload anything, transcripts, e-discovery, contracts, whatever you wanted and apply these early language models to them. GPT-3 happens. I think that's 2020. We see it. Wow. That's really neat. Ten minutes later, though, we know we can't use it for lawyers. It's just not there. It's an amazing kind of sense of, oh, wow. But not reliable enough, not nuanced enough, just not good enough to be something for attorneys. And so we continued focusing on our search systems and doing that stuff. And then one glorious day, 09/16/2022, my cofounder Jake and I were shown a demo of GPT-4. And within 12 hours, we had pivoted the entire company around it. It was just so much better than what we had used before or seen before. And unlike our BERT systems, which were really just about search, this tool was being used for, you could summarize, you could create timelines. We had this period where, and remember, this is before ChatGPT came out. And ChatGPT was based on GPT-3.5. And at some point later, Sam said, we broke it up because we thought it would be too much to see Chat and GPT-4 at the same time. Well, we had the absolute privilege of being the small group that got both barrels. So suddenly, we were on a Slack channel where we could ask GPT-4 to create a timeline or change this or, I mean, the next 72 hours, I felt like I was going to burn a hole in my brain from just how many things we tried and how many, it was just incredible to see what it could do. And so we pivoted the entire company around it and we just worked hand in hand with OpenAI, giving them feedback from lawyers. We were their domain experts for law. What we realized though, of course, early on is you can't use this stuff. You can't use GPT-4 as a chatbot for law because it hallucinates, because it's not up to date. I asked it about a case I actually worked on, and its answer was so believable that I was second-guessing myself about a case I worked on. And then I got mad, and I was like, no. You're wrong. I worked on this case. And I kid you not, it said, you can sit there and brag about the cases you worked on if you want, but I'm right and here's proof. And then it concocted a URL to nowhere. And so you think, wow, this isn't, you know, you can't use it like that. It will hallucinate. It will make things up. And so from the very earliest days, we understood you had to use what now I think is very common. Everyone I think is becoming almost, it's just retrieval-augmented generation. You have to anchor the system in a search engine that will retrieve real results and then force GPT-4 to answer based on what it's seeing in front of it, the real case law, not freestyling an answer.

Nathan Labenz: (21:10)

I had a very similar experience. I was in that same time frame as an early OpenAI customer, got a preview and had an uncannily similar experience where I was asking it about chemistry research. I had been a chemistry student as an undergrad and asked about the research agenda of the professor that I worked for and asked if she ever had a coauthor named Labenz, namely, meaning me. And it said yes and what paper, and it gave me the paper. And I was like, wait a second. Did she put me on another paper that I don't even recall being a part of? And then this was the entry point for me to understanding hallucinations. Not that I necessarily thought it was infallible before, but it was like, woah. You really have to be careful because here I am, and I think it was the middle of the night because I was also just really floored and just wowed and not sleeping a lot for a few days there, at least, really for a couple months. Yeah. I remember being like, this is so believable. It's confused me as to whether or not I was actually involved with this. I had to go look it up for real and reground myself on what was I actually a part of a few years back. So, yeah, it's a very uncanny, similar experience.

Pablo Arredondo: (22:26)

I mean, we were fortunate that we got to get tricked by it very early on in the lab since later, you know, these horror stories of the lawyers who went into court relying on it. There were startups that were putting up marketing with hallucinations in it. They were putting up their screenshots, and it's like, that's not a real case. So it definitely took a while to get there. So we put this out. And the truth is, it was so amazing that there was always this voice that's like, is this real? Can this really be real? Because all these things can't really be happening. Because it was frankly so unexpected. I think even the really enthusiastic folks weren't anticipating, maybe someone was, but certainly the folks I knew weren't anticipating the leap that was GPT-4. And for me, I was a coauthor of the study where it passed the bar exam. We ran that with some colleagues I knew from Stanford. But to me, what really drove home that it was real was when we brought it to legal research, and the law librarian community did not rip it to shreds. And in fact, they would say things like, this is solid. That is the most gushing praise I've ever heard from a law librarian on any, I mean, these, they live to tear things like this apart. And to see them sort of say, wow. It actually seems to be able to deal with the nuance of different areas of law and it's understanding these queries. That's when I was like, okay. Wow. This is actually happening. And that really helped. So early days. So first, we had six weeks before ChatGPT came out. So now we're getting to show very limited, I mean, OpenAI, understandably, there was a very small group that we were allowed to share it with. But this was a time where you could just demo the poems and people were floored. You were showing them everything. They'd never seen LLMs in any capacity. So you could just show them poems and show it translating and things like that even before you got into legal tasks. And they were blown away. They were, is it Skynet? There was all of that sort of angst. And one of the unique perspectives that we got because we had this early access was seeing a lot of people's first reaction to just LLMs as a whole and that mix of excitement and sort of fear and confusion. And I think there's something uniquely human about language. So when you see a computer for the first time doing it, it's something. And even though, obviously, it's just guessing the next word, but just having it even mimic language so well was unnerving, I think, to a lot of people. And so then we started, basically, we built out these different skills. We just took use case after use case after use case, packaged them all together as CoCounsel was the name of the product, obviously an homage to Copilot. And that's been basically what has been going on. It's been so different. The debut of CoCounsel was on Morning Joe on MSNBC. The idea of a legal tech product going on national TV for its launch, it would have just been absurd. Nobody, Morning Joe wasn't calling when we had the citation blind spot. The just the amplitude, just how much fervor and excitement. And suddenly, I'm used to giving demos. You give it to one law librarian, and they go, that's kind of neat. And in three weeks, I'll let another law librarian see it. That's your traditional legal tech. Here, you'd show it and all the partners would all assemble, the room, like, somebody's been embezzling or something. It'd be just completely different, a sense of it. And what we really had to focus on is being responsible with it, not having it claiming it can do things it can't, really putting the guardrails to make sure that it wasn't hallucinating or at least a de minimis amount of hallucination. Things like policing the results for quote checks, using old school tech to be like, does that quote actually appear in that case? And then really creating a system that facilitates oversight because the danger, of course, is attorneys over-relying on this stuff. Because it seems so good and seems so, oh, it's done. When in fact, it's not near human level ultimately for really important legal thinking and reasoning. I think it has a ways to go.

Nathan Labenz: (26:37)

Let's get into the product in more depth. It's really interesting. You obviously had a very early start, and I've had a chance to go in and play with the product hands on. Now, I'm not a lawyer myself and I also don't have a lot of legal documents lying around. So I couldn't necessarily push it to the limits that your actual customers would. But definitely a number of things jumped out at me. One is that you've been building a lot of the same things that the community as a whole has been building, but probably in parallel because being early to it, you didn't have the luxury of LangChain or LlamaIndex or whatever at the time that you were building. So you're identifying these problems in your own lane as the community more broadly is also figuring out what are the complementary tools that the language models need. I'm interested to get your perspective on a number of different dimensions of that. Maybe for starters, let's just describe what the product experience is. I think there's a couple of things about it that are notable. One is that it is a lot more structured than your typical chat. There are two tabs as the main interface. There's the chat tab where you're going back and forth, and that will feel very familiar to anyone who's used ChatGPT. And then there is the results tab. This is where from the chat tab, you essentially can create tasks. This is an interactive experience where you're having dialogue. The system can come back and say, okay, here are the tasks I understand you to be asking me to do. And then you can say, yep, okay, go do those tasks. And then those tasks actually get run in the background or in parallel. You can elaborate a little bit more on the kinds of tasks and the volume of tasks that people are putting through those. And then you can come back in a little bit and actually look at the results. So in that way, it's a little bit more of a... I often talk about copilot mode being your real time interactive engagement with AI, and then delegation mode on the other hand being if you're setting up workflows and you're trying to get to the point where you're not going to check every single output, I call that delegation mode. That's much more prompt engineering, much more systems integration, and so on. This, I think, lives in an interesting space in between where you're in that chat real time interactive mode, but you're able to spin off these individual tasks and then they actually live somewhere else that you can come back to and review. What are the tasks that people are doing and in what kind of volume? Where are people finding the most value from this product experience today?

Pablo Arredondo: (29:23)

Hey, we'll continue our interview in a moment after a word from our sponsors.

Pablo Arredondo: (29:27)

When we started, at first, it was just buttons for each skill. There wasn't basically a chat. You had a button called legal research. You had a button called review documents. And what we realized is that people like the chat flow. The chat is just a more intuitive and natural way to do that. And so then we switched maybe a few months ago to having the first thing you interact with be chat, and that raises challenges. You have to understand the intents. What are they actually asking you to do? Whereas it's simpler if you just hit a button. And then we sort of realized, well, wait a minute. Okay. Sometimes you might want to go back a little bit to more structured. So, hey, I want to review some documents. I have questions. Okay. Then we give you a more form-like ability to put in the questions if you want. So one of our challenges has been how do you strike that balance between the wonderful flow of a chat and the intuitiveness of a chat, but knowing that behind the scenes, we do have these discrete skills and these discrete capabilities. How you do that is something that I think will frankly be continuing to evolve. I think we're still trying to find exactly the right balance. In terms of the skills, you can loosely divide them into... There are two major flavors of lawyer, maybe three if you include criminal folks who do crime. But for clients, there's litigators who are used to this person in court, and they're the ones that are constantly searching case law. And then there's transactional law, which is, I want to merge companies. And there's a huge amount, obviously, of law that goes into evaluating those contracts and the various due diligence and things like that. Casetext for 10 years had really been focused on litigators because we were legal research. And that, although it certainly can sometimes impact transactional law, really litigators are the ones who are constantly searching case law to find cases to cite to the judge. But when, even starting before GPT-4, when we had our BERT-based search, suddenly we were being told by these firms, hey, can our contract, our transactional guys take a look? Because they too sometimes need to search for something. So our first battery of skills during the beta phase spanned both. So we had legal research where it'll just run RAG on legal research. We had things like deposition prep. So I'm going to depose so and so, and it could suggest a bunch of topics, let you alter the topics, then suggest a bunch of questions to sort of jump start your ability to prep for a depo. We had things like timeline, so you could upload a big messy corpus of documents, and it would create a chronology, which is something that's very labor intensive to go do as a first year associate spending a lot of nights just trying to map that stuff out. And then on the transactional side, we had the ability to ask a question of a merger agreement. What does this term say? And then fancier things like one called contract policy compliance. Basically, let's say you're a big Fortune 50 company and you say, here's how we do things for our contracts. We insist that it be governed by Delaware law, or we won't sign unless the IP has this or this. What we could do was basically give those policies to GPT-4, and then anytime a draft contract came in, it would police the draft. Not only would it sign off on the relevant clause, say, is it kosher? Does it actually comply with how we do things? And if not, it would then suggest a redline for how to change the draft to comport with how you do things. So it's like you can almost think of it like there was spell checker, then came grammar check. Now thanks to GPT-4, we have substance check. We have the ability to check it for deeply substantive aspects of things to see if it complies. And then we had things like what's market. You want to see how have other companies handled this aspect of a transaction. We could pull all the relevant data from the SEC. GPT-4 would sort of scan it, find relevant stuff, and then synthesize a report for you. So even though transactional law was not really our wheelhouse, with the power of GPT-4, it's pretty fast and you can start creating pretty valuable skills. And then on the litigation side, I mentioned a few of them. Some of the more interesting use cases actually during the beta phase, we haven't yet productionized. But one great example, we had a Fortune 500 company, and there are certain expert witnesses that make their living just testifying against the company. The guy who just year in and year out, I'm the one that the plaintiff calls to go say why, Monsanto or whoever, you know, take your pick of company. So they said to us, hey, if we gave you all of this expert witness's prior expert reports and prior testimony from multiple earlier litigations, could CoCounsel analyze and find contradictions that we could use for cross examination. And to me, that was one of the most amazing use cases because that is the kind of thing that lawyers go to law school to do. A lot of times you'll hear people say, what will be automated is the menial work that nobody wants to do anyway. And look, the truth is, yes, LLMs will help tremendously on that front, but they can also help for things that are deeply substantive. Really, finding inconsistencies so you can go do the famous cross examination that destroys the witness. That is at the heart of what being a lawyer is sometimes, or certainly being a litigator. And so what we found is that in just the earliest days, the ability to point it at pedestrian things that are more just tedious than maybe intellectual, but also to see how it can help with things that are actually quite intellectual and quite substantive.

Nathan Labenz: (35:06)

So in terms of how that is built, it's crazy that we're still not even a year into GPT-4 being public, and there have obviously been many versions at this point and enhancements in terms of new features such as function calling and also just a better understanding of how to do the retrieval augmentation and all that kind of stuff. I wonder if you could maybe the best way to ask it would be to give a little bit of a history of where you started and how things have evolved with the product. I imagine that... And context, by the way, is another one that obviously is huge. So first version, 8,000 token limit, limited access to the 32k. Maybe you're using the 32k, that obviously could get a little bit expensive. So I imagine in the early days, highly structured and big emphasis on managing context. Perhaps it's still that way and you have a bunch of very discrete prompts and it's about chaining them together. Or perhaps because the models are getting a little bit better generally and the context window is growing, you're able to just push more and more onto the models and rely less on your own structure.

Pablo Arredondo: (36:16)

Yeah. So a few things there. I mean, the context windows have sort of broken my heart a little bit. Because they turn into a mirage, right, which is, you know, they're missing stuff in the middle. They're not as accurate when you fully utilize them. So I say it's like saying, my boat can seat 100 people, but if you put more than 10 people on it, it will sink. It's like, well, does that boat really seat 100 people then? So for what we do, being right is so important. So far, the longer context windows haven't really yet been there for us. Now it's getting better. I think there's progress being made. And another limitation for us is we do really rigorous testing on a model. So OpenAI will come out with an improvement. It's not like we just swap that in, we're not just pointing to whatever their current one is. We have to go test the model because it might be better at one thing, but worse for what the lawyers are using it for. And so there's always this lag where we know there are these wonderful things yet, but we don't yet have them in our system. There's a process for that. I mean, I remember one great example is, now there's a JSON toggle. Just JSON. We spent weeks trying, everyone begging it. Please do JSON. Please do data. That's an example where the evolution of just the tooling, I think, is what you're pointing to. Things are getting better and better and easier and easier for developers. And believe me, well worth it for the early access. I'm not complaining by any means, but we certainly... Some of those we were doing the hard way, and now there's easier methods. But in terms of... Yeah. So I think that's the main thing. I think that there's a lag for us because we have to go test and vet everything. We can't just hot swap it in, even though I know there's a lot of great stuff that's coming out. And then there's other aspects like law gets really violent. There's often violent topics. There's often racist topics. There's often really... And so we had to have these filters removed so that GPT-4 could interact with this stuff in a way that the chatbot, as an issue of alignment, they basically steer clear of that stuff. So there's a number of ways where our system, which is on our own dedicated instances, we have basically our own kind of path that we're doing. There's a number of instances where we had to deviate both to kind of protect our use cases and also just to ensure quality.

Nathan Labenz: (38:26)

Are you fine tuning GPT-4? I mean, it sounds like there is a slightly different version that you are using as opposed to the normal public API?

Pablo Arredondo: (38:38)

Most of what we've done so far has been relying on just GPT-4 without any fine tuning. The retrieval engine for some of our RAG, especially for case law, was using a homebuild system that we trained on common law. But very much it was GPT-4 that was doing the heavy lifting. Now I'm very proud to say that our team, as part of our alpha testing, we got to do some reinforcement learning for the actual GPT-4. We had folks teaching it not to swear and folks teaching it not to tell you how to build Molotov cocktails. We actually had some of our very fantastic reference attorneys and folks giving input on how best to describe how a document is relying on a case. So I... It's like we got to put one tiny little pylon in the Alhambra. I mean, I just forever talk about that. So that's the closest to sort of legal specific stuff. Now GPT-4 was trained on a lot of case law. I think it's certainly seen a lot of this law, but we haven't yet done any sort of fine... We don't have a variant of GPT-4 that's fine tuned yet. Certainly, as you know, that's something that increasingly OpenAI is looking for partnerships, and there well could be some things there.

Nathan Labenz: (39:48)

On the eval, it sounds like you have put a lot into that. I think that's something that a lot of people are coming around to now. In the main application that I have built, which is a video creation app, the core language model test there is video script writing. So we don't need to be nearly as rigorous in our testing. But one thing I have noticed is that the flywheel is getting tighter, right? It's getting easier and easier to run. We use 3.5 Turbo fine tuned. It's getting easier and easier to rerun the fine tuning. Now you've got a new model. We used to not worry too much about the evaluation process because we felt like we were just up close and personal with it enough that we kind of would get a feel for how it would go. Then we had a couple spot checks that we would run and, you know, that would be kind of that. Now with the flywheel getting so much faster, it's like, geez, we do need an automated way to do this. Not to say that... I'm always a big believer in not fully automating this stuff. You may have a different opinion, but I'm always like, there's no substitute for still being at least somewhat hands on. But I'd love to hear about your system because, I think a lot of people right now are searching for what is the right balance to strike in evaluations? How much should they be doing with model-based evaluations? How much should be objective? How much should still be manual? What have you learned, built? Are there any good tools that you love? I mean, everything about evals, I think, is of interest.

Pablo Arredondo: (41:17)

You know, some of the most important stuff we did as a company, the users never see. And that was we built an internal framework. So I mentioned we sort of pivoted the entire company within 48 hours of seeing what GPT-4 was capable of. Part of that was creating these trust teams whose job is you wake up and you try to find this thing messing up. And by messing up, I don't mean crazy hallucination, although that too. But also, is it just saying the wrong answer? Is it quoting the wrong case? Just sort of any and all ways that you can find this breaking. And we just built a whole framework that had these series of tests that you could run. So it's like you said, it's a mix. You want as much automated as possible, but then there is no substitute for also having people using it. And in our case, attorneys using it. So you hear very quickly if there's an issue. And so we erred on the side of testing. And could we have moved faster? Probably. But we really thought that what it means to be responsible with this is to just go overboard with testing. And so when we first had our beta, I think we had 16 or some much larger number. We actually launched with a very small subset of that because those were the only skills that we felt had met the rigorous testing we had done. And there were some that were close and that was painful. Like, actually, our timeline skill was an example. People were like, why did you take that away? I love it. I love it. We said, well, our testing, it's not there yet. So I think it's use case specific. For law, for medicine, certain professions where screwing up has a really bad impact. And there's other ones where maybe you have a little bit more wiggle room. Okay, like... I think it's use case specific. But for law, it's just one of these ones where you can't be testing enough, frankly. Because there's just, people get hurt or harmed if you don't.

Nathan Labenz: (43:06)

So today, you use your own framework. You have a suite of automated tools that confirm that you're still getting the right answer on all the key questions.

Pablo Arredondo: (43:16)

We merged with Thomson Reuters. So now with Thomson Reuters, we've created the master skills factory. The testing, the prompting, the whole flow. Because part of it is the velocity of creating these functionalities. That's another dimension of competition. It should be, how quickly can you go from the user need to something that you can put out there and trust. And so one of the exciting parts of collaboration with Thomson Reuters has been... I mean, talk about quality control. Thomson Reuters is the absolute gold standard, and they've been doing it for far longer than there were computers in terms of really making sure, like, editorial excellence on their Westlaw Citator. And nothing's infallible, but Thomson Reuters, part of why Jake and I and Laura were like, these are the best partners we could have is because there's just nobody better at ensuring that you can trust the output. And so that's one area where our techniques and approaches and our philosophy merges very well with theirs. And so I think we've combined it, and I think we're going to be amplifying it and magnifying it in terms of all the things that we could do.

Nathan Labenz: (44:18)

Is model-powered evaluations part of that framework?

Nathan Labenz: (44:24)

We use, for example, GPT-4 to assess the 3.5 scripts on pretty subjective dimensions, and ask it to give us a rating. And I'm always like, I don't really trust that, but I think I at least trust it enough to say if the average rating takes a dive, then I should be paying attention.

Pablo Arredondo: (44:43)

We're almost entirely GPT-4. So you're not... You know what I mean? It's not like we have GPT-5 to go police GPT-4. So when you have GPT-4 policing GPT-4, right, it raises... This... It's possible that... You know, it's a different calculus in terms of how useful that could be. It may be that we're doing some experiments for that. So one of the things we're doing behind the scenes is what happens when you apply GPT-4 to 3 million documents? These huge corporate discovery where you have a several million doc review. And so you can have it review the documents, tell you which ones are relevant, and describe why it's relevant and give it a score. Then what happens is there's a lot of documents that get the highest score. And you look at the descriptions and you're like, wait a minute. Some of these are clearly palpably more important than others. So then you say, well, what happens if GPT-4 gets involved again and reranks based on the descriptions? And suddenly, you have a much more intuitive list. So using GPT-4 to enhance the output of an earlier GPT-4 is certainly something I think that has a lot of potential. And it may be as a complementary technique using it to police for quality control is important. I don't think I see a world where we completely turn things over to GPT-4 to do quality control yet, although I would be surprised if we're not doing some experiments at least in limited ways that can help.

Nathan Labenz: (45:57)

Okay. Let's do the same thing that we just did for evals on the embeddings and RAG side. Again, you've been working on that close to 10 years before this moment. Now, of course, there's a rush of new embeddings options. OpenAI's got them. Other people have got them. Sounds like you're still using your own core embeddings tech that predates the GPT-4 moment. What could you tell us about, like, how do you chunk stuff? Yeah. I mean, I have so many questions, but tell me everything about RAG.

Pablo Arredondo: (46:31)

Yeah. So chunking, okay, that's a great area. Because one of the things you want to do is put domain expertise into it. Don't chunk where the question from a transcript gets separated from the answer from a transcript. Those are better thought of in pairs. So we did have some domain specific chunking that went into it. With the embeddings, yeah, that was one of these rare moments where our in-house thing, a thing we had built... Wow, it does seem to be outperforming, at least. But again, this is late 2022, early 2023, which in this field is like a decade, it's like a century in terms of how things have progressed. But there is enough weird nuance with case law. It is a strange corpus in some ways that we did feel like our own home-trained system was working better than what we saw. We're constantly evaluating that. Our main thing is just the best user experience. So the moment we think that there's embeddings that will create a better experience, we will switch. And then what's sort of funny with RAG, I mean, I... And this might be tangential to what we're talking about, but one thing that's been interesting is, you know, you have these firms, and they're very happy. They're very proud of their legacy, and they're very proud of what they've collected and built, which they should be. They've been around for 80 years. And there was this sense of, like, well, can you create a GPT-4 with just our data? And they're like, like, I know it's read all of the Internet, but wait till it gets a hold of our summary judgment motions. That'll really kick things into another gear. And you're kind of like, no. I don't think this is really going to move the needle. What you want to do is RAG. You say, no, no. Point GPT-4 at your documents. That's how you leverage what you have. But there was a sort of marketing of, we'll make your own model just for your firm, which I think, frankly, there was no evidence that that was actually going to have a better outcome, but it just fit the ego of the firm and their natural desire to want to have a competitive edge. So that was one thing we sort of encountered early on. I think now firms are coming around to, let's just use RAG. Let's just point it at our stuff and how best do we leverage what we have as opposed to I want to go spend a bunch of money to create a new model that's everything GPT-4 saw plus, our relatively paltry amount of content.

Nathan Labenz: (48:34)

So what happens when somebody brings their own data to the table? I created my account. I'm dropped in there. It's like, okay, you can create databases. There are six different kinds of databases that you can create. I assume that those are pretty similar core underlying technology, but perhaps with some different processing or prompts.

Pablo Arredondo: (48:55)

Nathan, because listeners to your podcast deserve the unvarnished truth, I think they might be absolutely identical. I think those six options were for education purposes to tell you, these are things you can do. I don't think we're switching up the embeddings. Maybe for one of them, we might have something specialized. But yes, to your point, working with law firms, security becomes the number one, the immediate thing that you're always talking about. You can be showing somebody a time machine, and their first question is going to be about their client's privacy. And it should be. This is important stuff. So our philosophy has just been, obviously, it does go through GPT-4. So it does leave, but then you can have it immediately deleted. Or if you want to keep it on our system with our embeddings, you have that right, but you can have it deleted every day. We went through the SOC 2 compliance. All of the various rigorous security stuff you have to go through. And of course, we don't train any models with your data. And there was this little window where that was a differentiator between us and OpenAI. Because they're sort of like, why don't I just go to OpenAI? And for this little period, it was at least ambiguous whether OpenAI was training on the data. They have naturally come to the same place that everyone working with enterprise comes to, which is no, we won't unless, you know, you want us to, basically. So I think that's your point is how do you deal with the security and privacy? Is that the main question?

Nathan Labenz: (50:15)

Yeah. That's interesting. I was actually even more thinking about just the technology. It sounds like you're taking these documents, passing them through GPT-4 to be chunked. Perhaps there's also metadata being extracted. And then I'm wondering everybody's looking for tips on vector databases or a point of view on the future of vector databases. So I'd be curious which ones you're using, if it's a hybrid structure, if there's a graph that's being synthesized. All these sorts of very practical lessons learned, I think, are great.

Pablo Arredondo: (50:45)

Right. So I'd say that we have domain specific chunking we do on our end before... So we sort of go there. I don't believe that we're right now throwing in the latest vector databases on it. So I'm actually not a great person to ask for that kind of stuff because we've just sort of been sitting with our homegrown common law specific one, which at least last I checked has been, for us, more performant than... Better accuracy than the ones that we've tried. It's interesting because we were doing this stuff early before GPT-4, we kind of set up a bunch of stuff that we're still sticking with in terms of those embeddings. Now that could be something where we need to examine that and say, no, like, just... You're doing it early, but now it's obsolete. But that's just something we're testing constantly, and so far, we haven't seen a real reason to switch. We certainly have the case law citation graph going back to our traditional research tool. When you're reading a case, you have to see who cites to it, and that's the whole thing. We're not doing anything with graphs with GPT-4 specifically. We're not leveraging it that way.

Nathan Labenz: (51:43)

I've always got a handful of different apps that I'm thinking about this with. One other one, just for context, is I'm working as an AI advisor to a company called Athena. We're in the executive assistant business. And...

Pablo Arredondo: (51:56)

Wait a minute. I think I use Athena for my executive assistant. Yeah.

Nathan Labenz: (52:01)

This is early days for us as well, but of course, we want to be more efficient. We want to bring AI to more things that we're doing. One of the experiments that we've been working on is can we create some sort of retrieval augmented chat experience that allows the assistant to get deep context on the client, stuff that maybe they've never even talked about before, or certainly we don't want to be asking the same questions repeatedly if we can avoid that. This is especially important in the early days of a relationship. A challenge that we've had in trying to build something like that is you get all this information, right? People can upload anything, and then we chunk it, and maybe we could be doing more just to use GPT-4 to do the chunking a little more intelligently. Our approach is straightforward, could be an improvement. But I think a lot too about, like, if I match on this chunk, well, where did this chunk come from? In your naive chunking strategies, that stuff a lot of times gets lost. And also, what's the timestamp on that document? If you go pure vector database, sometimes you don't necessarily have all those features. You could go to something like a Postgres, which certainly is not a native vector database but is adding these capabilities. So I'm kind of wrestling with all that sort of stuff, like how much preprocessing to do, how much structure, how much synthetic metadata.

Pablo Arredondo: (53:26)

Such a great area. Yeah, I'm convinced. Right now, we're throwing CoCounsel at the raw text. Chunking is basically the only favor we do for it. The way I think about this is, like, if you were an assistant at a law firm, you just started, and they said the night before your first day of work, you can go into the office and open all the files. What would you do to make your life easier? Things like you might create a cheat sheet. These are the contact information for the attorneys in the current pending litigation. I'm going to put that on this. I'm going to create a new document that's just for me to be able to more easily pull that. That's what humans do. That's what a Rolodex is, right? I think there's a lot there of what you're looking at, which is can you land GPT-4 in a new information environment where at least it knows the general area? I'm for a lawyer. Can it go and rename files according to what makes them more efficient? Look at the first page of each document to then create a new layer that then it can interact with more quickly. I think there's a lot there. I think there's going to be a lot of the art of doing this really well. I hear what you're saying. I think you're right. It completely makes sense for an assistant. We're doing some experiments with it where we're doing things like you could type in the docket, which is just the number unique identifier for litigation. We pull the docket, and now we're starting to parse out a bunch of information even before you run a skill so that when you do run a skill, when you say send a letter to IBM's counsel, it can then go and do a more robust job of that than it would if not.

Nathan Labenz: (54:56)

It sounds like you are throwing GPT-4 at every problem as the first approach, which is something I've often recommended. Don't get too cute, too quick. See if GPT-4 can do it. Try to make that work before you try anything else. I think one reason a lot of people don't do that is that they worry that it's going to become too expensive and, perhaps, just not viable. So I thought actually the CoCounsel pricing was one of the most interesting aspects that I encountered in my exploration of the project. There's a couple different prices you can clarify, but basically it comes in at something like $200 a month, which is obviously 10x your retail ChatGPT subscription. People may balk at that a little bit, but I think that's really smart. I've been thinking more generally, why don't people just build the very best thing that they can build with the very best models that they can possibly access and charge whatever it takes to make that at least viable enough that you're not outright burning money? We've seen a pretty clear trend in the models getting cheaper. So presumably, some margin will kind of come back for you. First of all, is that the strategy? And how have you been thinking about this question of pricing, and how have people responded to it?

Pablo Arredondo: (56:23)

First, I mean, look. We were so spoiled. You get GPT-4 very early for free. We weren't charged. We were just swimming, and I just love it. It's the equivalent of somebody growing up like a Saudi prince. You don't even have a concept of it. All you're doing is just the joy of exploring this model. The guidance we got from the folks at OpenAI too is that, look, first build the best and then optimize. That has generally been our approach. But then, of course, the bigger situation is it has to work, and it has to work for law and has to work reliably. A lot of the times when we tried to test other models, it failed there. So there you go. It makes it easy in a way. The truth is things are catching up. But for the majority of the last year, GPT-4 has been a big leap ahead of even what was in second place. Tell me if you disagree. For us, that leap meant the difference between we can put this in lawyers' hands and we can't, with confidence, with reliability. Now are there subtasks within some of the flows that you could outsource to delegate to 3.5 before? A lot of our machine learning folks would be annoyed with me for how GPT-4 I'm throwing at them. They're probably like, you've got to start thinking about these other models. But for us, again, it's like it has to work. It has to work reliably. That does often mean for us, for the majority of tasks, GPT-4 does it. The other ones we tried don't. End of discussion. Again, as it evolves, we're constantly looking at this. Things are scaling up. You're increasingly going to want to look at where you can. But basically, law itself demands that we always focus on quality. To date, by and large, that has meant GPT-4 or nothing, frankly.

Nathan Labenz: (58:18)

So how have people responded to that pricing? I mean, I can imagine on the one hand, people might feel like that's more expensive than other software products. I feel confused about maybe the future of legal billing because I'm like, if there's one thing that this tool is supposed to do, it's supposed to save you time. But if you're charging for your time, how does that work out? In any other business that I'm in, I would say, well, jeez, $200 a month. If it saves me one hour, it allows me to bill one hour more, then great. But wait a second. Those are not the same thing in the legal space traditionally.

Pablo Arredondo: (58:55)

The market has responded. I say the market. For me, it's always the profession. The legal profession, again, you've got to look at it like there's a difference between your huge multinational firms and your solo practitioners. But the $200 a month pricing generally is what, for solos, for smaller groups, they kind of just want to buy it. I think a lot of it is like if they think of it in terms of a legal technology tool, they're like, wow, that is pricey. If they start to say to themselves, wait, I would have to hire a paralegal to do that, suddenly it becomes quite a bargain. I think part of why I think the price has been well received is that they're getting that these are things that aren't beyond just having Thomson Reuters or Lexis, or a document management tool. It's really getting into actually tasks that you would have to hire a human and pay a lot for. We're always trying to make it more affordable. I read that in the early days of electricity, only Wall Street and Madison Avenue got streetlamps. To some extent right now, there is probably that going on. Especially if you want to do high throughput stuff, these large firms that have much deeper pockets and the litigations are such that it makes sense to go spend more money given the risk of losing the case. They're right now able to do things with GPT-4 that are probably impractical for your everyday attorney representing you or I. But like you said, it's getting better, and we continue to try to design prompts in a way that lowers the intensity so that you can bring down the cost and things like that. But on the whole, we were a bit worried too because, yeah, that is a lot. We've seen them balk at lower numbers when it came to traditional research tools. But here on the whole, I think they just realized that this is so much more than just once you start thinking of it almost like this weird colleague and not a tool, then it's been pretty positive.

Nathan Labenz: (1:00:46)

So one of my big theories in AI in general is that we'll see a lot of consumer surplus, meaning, in the classic economic definition that willingness to pay will greatly exceed the actual price. Is that basically the trend that you think you are enabling for your customers' clients? Like, are they just getting more for the same bill or their bills are shrinking perhaps because it's just taking less time to do the job?

Pablo Arredondo: (1:01:15)

It's funny with pricing. I mean, Jake, Laura, and I, we were lawyers by training. We love legal tech, legal informatics. Pricing expertise was not necessarily what we were the best at. We've joined Thomson Reuters, which represents a much more mature, sophisticated understanding of how you go and test and measure these things, if that makes sense. I think there's a couple of things. First, there's going to be competition, and, well, there should be. That's one of the benefits. It's not like only one company has this and therefore can just charge whatever they want. That, I think, keeps you kind of honest on it. That said, Thomson Reuters, we're good with trust. You trust us. When your things matter, Thomson Reuters, you go with, and that does involve more testing. That does involve perhaps more expensive models sometimes and things like that, and that will be reflected in the price. So I think, first, I would defer to, like, there's probably literally 50 people at Thomson Reuters who are just far more qualified to talk about how they're going about thinking about the pricing. But generally, just that sort of tension between, of course, there's competition, but at the same time, like, there's a caliber of product that people expect from us and that people need from us, and that will always be reflected in the price.

Nathan Labenz: (1:02:25)

How about just kind of how you measure the reliability? Obviously, a huge theme of your comments has been the critical nature of reliability. I think you said de minimis hallucination a little bit earlier. Do you have kind of metrics that you watch or like a bar, as long as we're below that hurdle, we're good. Could you be confident enough to say use our product and you'll never have one of those embarrassing moments?

Pablo Arredondo: (1:02:52)

You can't say that about humans, right? You can't say that about hiring a human and you'll never have an error. I think what's important too is that it's not just these cartoonish hallucination problems, which we avoid not just from the RAG architecture, but we have the cases beneath with links. We police it. So if you see that link, that's a real case. It can't not be a real case. So we are able to sort of give certainty into that stuff. But even beyond just those kind of grotesque hallucinations, if you will, it's not infallible. Sometimes it can misread what a judge is writing or misunderstand kind of what the user intent is. That's really where we're, I think, the most focused, is how can we get better and better there? I was always averse to having the thumbs up, thumbs down back when we were just a search engine because I was like, we're supposed to be the professionals. Imagine if your doctor was like, hey, Nathan. I recommend this prescription. Hey. How'd I do? What do you think, buddy? It's like, you're the pro, dude. You should know. Boy, did I get over that when it came to LLM stuff. So we have an ability to give real-time feedback, and then we just call and follow up. We have a weekly meeting where we do nothing but talk about any complaints that came up. And then we have our systematic quality control constantly running tests, et cetera. You're never perfect. It's going to be something where you're continually striving and striving and striving to get better and better. But I'm confident to say that nobody does more than we do in law to make sure that it's working.

Nathan Labenz: (1:04:18)

For somebody like me, so I have a couple different ventures that I'm involved with. At Waymark, we do not have an in-house counsel, and fortunately, we don't have to often avail ourselves of external counsel either. I guess I don't even know if you would sell this product to non-lawyers. It's just straight not available. You're like, you have to be licensed to even become a customer.

Pablo Arredondo: (1:04:40)

I mean, yeah, we made the early decision to not go to what's called pro se in law, which is a little bit different than corporations that don't have GCs because it is so misleading because you think, look, I've got this memo. I'm done. Look. It's just written just like a lawyer. Yeah. No. A lawyer can look at that and see that there's more nuance, that there's certain exceptions that the lawyer knows about. So we thought it would be dangerous. Now to be clear, one of the most, probably the most important thing these large language models are going to do, big picture, is provide services for folks that can't afford lawyers or provide some form of services, some form of help in some way. I think there's a colleague of mine at Stanford, Margaret Hagen, who's working in there. I defer to her. I'd go study her work. She's fantastic. So to be clear, don't get me wrong. That is going to be one of the most really important ways that we use LLMs that I think as a society, we should be judged by how well do we solve that really critical problem. From where we sit at CaseText, we did not want to give folks the misleading interpretation like, oh, I have a lawyer. It's called CoCounsel and I can go in there and do it. And so, you know, separately from that, there's this unauthorized practice of law issue, right, where it's like, are you even allowed to go purport to be a lawyer? But frankly, it wasn't fear of reprisals from that so much as understanding that we didn't feel comfortable doing it. So, like, look. If your company wanted to use CoCounsel, could you use it? Are there a lot of skills that are sort of universal that you could put in a hundred transcripts from speeches or whatever you cared about? Yes. You could use it. But we're certainly not selling it being like, you don't need a lawyer because you've got CoCounsel. You're all set. It's just not there yet.

Nathan Labenz: (1:06:19)

I've had a few moments where I've had interesting experiences along those lines where I'm, even just for personal stuff, I get a contract. I'm going to do some 1099 work and had one from a big tech company that has a reputation for having a lot of very restrictive clauses in their contracts, at least that's kind of the sense that I had. So I just take them to Claude and to ChatGPT and say, hey, anything jump out to you from this contract? If neither one flags anything that seems meaningful to me, then I'll usually just kind of say, okay, that's good enough. Let's roll with it. But I mentioned I could do better with a CoCounsel seat, but I also understand why you would be reluctant to put yourself out there in that way.

Pablo Arredondo: (1:07:03)

Yeah. I mean, it starts like, well, what jurisdiction are you in? That can impact whether some provisions aren't enforceable in some states. It gets more nuanced than just what you're going to get from a chatbot or even frankly from a more specific CoCounsel. What we build is a tool for lawyers. It can make your lawyer bill go down. It can help your attorney not overlook something, but it's just not a replacement. Not yet. Now there are many folks who believe that's the goal and that's where we're going to get. We just, there's such debate about how far these LLMs will go as we scale them up. Much, much, much brighter minds than mine are debating where does this plateau. So I can't rule out that further generations of this will, but at least for right now, we're not there. We're not close to there.

Nathan Labenz: (1:07:53)

Yeah, that's maybe a perfect transition to the future of law section that I thought we might spend a little time on as we get toward the end. For starters, yeah, I sometimes call that the $100 trillion question, where will the models plateau. That's inspired by the size of the global economy. Because it's like, could very easily do the whole $100 trillion, with not too many more leaps. What are the things that you would say with GPT-4 are still the weaknesses? You mentioned one is missing things in the middle of context. I also meant to ask if you were using function calling at this point or if you're still kind of custom doing your own implementation of function calling.

Pablo Arredondo: (1:08:35)

No. I think we're increasingly folding in the function calling. Yeah. I think it's like porting some over, and then, but yeah. Again, all their great stuff we want to leverage. It's just we have a slower process because we have to make sure it's not screwing anything else up. But yeah. So, look. Where it is now, it lacks the ability to understand a broader context of what it's working on, which can sometimes lead to it, it can't give you legal and strategic advice, right, because it doesn't really understand kind of where these things play in. I don't think it can write persuasively enough to really write briefs. But let me caveat that. I have a bias. Some people accuse me of having a romanticized view of writing. But to me, writing is thinking. As you write, you think. Actually, Paul Graham, the great YC founder, has written, I think, on this. I completely agree with him that it's very dangerous to let the writing atrophy. We can't spell anymore because the red squiggly line will bring us home. Nobody can read a map because why would you when you have GPS? But I think writing is qualitatively different. Wrestling with that blank page, I think, is something that we actually should jealously guard, at least for the types of writing that really involves substance. Look, there are certain types of legal documents. Declarations. They're supporting documents. Sure, of course. But for actual advocacy and the same thing for judges, we recently had a judge sort of raise the question of like, well, could this thing start running first drafts of opinions? I think that's very dangerous. I think we lose something about the evolution of law, the way that it mutates and then changes because of this originality going into it. Other folks would disagree. I think that's going to be one of the most important things in the future of law. It's like, what happens to that? It's like, are we letting AI kind of just generate a brief and then that just essentially with some oversight goes to the lawyer, and then the judge is using the two briefs and puts it into AI. And, oh, here comes the opinion and there's some oversight. That's one of the profound questions. I think you'll start to see more AI arbitration. But, you know, the truth is there's a guy in England, Sir Richard Susskind. He was the sort of OG AI and law guy going back 30 years. He says, whenever there's a new medicine that comes out, the doctors only talk about the patient. What does this mean for patients? You don't hear them say, what does this mean for our billables? What does this mean? And sort of exhorting us in law to have that same professionalism, because ultimately this is a profession. We owe a zealous, undivided duty to our clients. And if that means that we have less money at the end, that's actually completely fine relative to our duties. It's not. And so I think the more serious lawyers among us are going to say, how can we better realize the goals that we all set up for ourselves with this stuff? And the truth is we're willfully behind on this, so there's so much room for improvement that I think we'll see AI get a lot of adoption and start to just make process faster, less expensive. And again, if used correctly, even increasing the quality of the pursuit of justice.

Nathan Labenz: (1:11:42)

Yeah. I appreciate that. And I think broadly, I really agree with you. Bringing up the bottom and improving access to expertise for people that don't have it all too often is to me one of the most exciting things about the AI moment. And at the same time, I think it is, at least for now, really important that we do not cede control of the frontier and, kind of where critical aspects of society are going. So I would strongly agree that, it seems, very specifically, I was going to ask, do you think that there, you kind of touched on it a little bit, but from the medical profession, as you noted, I've been very impressed with how positive the reaction to AI has been and how kind of non-defensive it has been from most corners. I would say the same thing is broadly true of the legal profession, although I don't maybe have as much data on that. So I'm curious as to kind of how you see that shaping up, like what reaction do you expect from the bar. Would we get to a moment in the near future where you'd be guaranteed some right to legal AI advice as well as human advice?

Pablo Arredondo: (1:12:53)

Yeah. That's so interesting. So maybe I can contrast two essays both written by the same person, Chief Justice Roberts, right, the Chief Justice of the Supreme Court. At the end of the year, he writes an essay about whatever topic he wants. In 2016, he wrote about legal technology. I'm paraphrasing a bit. I recommend, go read the real thing as always. But he sort of, almost, I don't know if brags is the right word, but he sort of says, like, we're really slow, and that's a great thing. And he actually points out, I don't think most people know that. Do you know that the tortoise and the hare is actually engraved on the Supreme Court building? There's actually a turtle and a rabbit carved into that building. You know, they got Moses up top, and then below, you've got this Aesop fable. And he sort of points to it and says, yep. That's the turtle. And I would say, like, there's more options in life than a turtle and a narcoleptic rabbit that falls asleep five minutes before it's supposed to win the race. Like, let's. So that's 2016. You would, I wouldn't say anti-tech, but very much talking about how we should be slow and it's almost it's a feature, not a bug. Fast forward to 2023, he writes an essay about generative AI and LLMs and says things that I've been very much screaming from the mountaintops. He says the first rule of the Federal Rules of Civil Procedure, these are the rules that govern the protocol, calls for the just, speedy, and inexpensive resolution of matters. Surely AI can help us better realize that. By us, he means the courts. He says at one point, I think he says, legal research will be unimaginable without this now. Very kind of pro, very much not reckless, but, and I think at the end, it says judges will still have a job. But certainly, the kind of, exactly as you said, like, pleasantly surprising in terms of being receptive and sort of acknowledging the fact that this could bring a lot of value, and it's not just something to kind of be scared of or contain. To me, that bodes very well. And I think, generally, I've seen it in other parts of the profession. I think there's a real willingness now to engage with this stuff and not just from a how do we destroy it or kneecap it, but where are the ways that we can use it.

Nathan Labenz: (1:15:03)

Yeah. You could almost put that excerpt right on your website. I'm not sure how that would...

Pablo Arredondo: (1:15:07)

Oh, no. I went T-shirt, website. I might even add the ink. Might even tattoo that on. No. No. No. It's wonderful to see that. I think lawyers, this is a time to rejoice if you're a lawyer. This is profoundly useful technology that will help us do our craft and our profession in a much better way, a much more sane way, frankly, more enjoyable way. I mean, just sort of. And so, I know there are certain billing structures that are built on, hey. That associate just for 12 hours just did basic doc review, and I was charging $600 an hour and only paying that associate $100. I appreciate that those sort of paradigms will probably have to be reworked, but I think that pales in comparison to the transformation that we're going to have with our profession to one that I think, frankly, is in dire need of that kind of transformation.

Nathan Labenz: (1:15:57)

Yeah. You mentioned also AI arbitration. That's something I've been kind of fascinated with, but surprisingly haven't seen, and I guess maybe not surprisingly because, again, the timelines are all really short, so it's going to take time to work this stuff out. But I wonder if you could sketch that vision in a little more detail.

Pablo Arredondo: (1:16:14)

My ConLaw professor was Larry Lessig, who has done a lot in cyber law and things like that. He was one of the very few people that I showed GPT-4 to in the earliest days. And I was expecting, he's going to calm me down, tell me to put some brakes on this thing. He immediately was like, for 35 years, I've been thinking about what we could do for an AI judge. Like, this is great. I was like, Professor Lessig, that's way more, oh, I guess I shouldn't be shocked that he had even sort of more ambition for it. So this is something I think predates large language models. This attempt, how do we code in rulemaking and adjudication and sort of legal logic? And it was all just hopeless. Like, laws are just so fuzzy and messy, et cetera. But now with the advent of GPT-4 and these sort of capabilities, the idea, both sides consent, submit information, and what you've done in the background either through RAG and through a lot of prompting is basically encode dispute resolution such that it can spit out an answer. And then, both sides have agreed, then I guess that's the answer. If not, maybe there's some right to appeal to a human and things like that. Again, that to me is, I mean, if you take one very specific type of dispute, maybe that has very, maybe you could kind of get there. I'll be honest, every time I think, oh, GPT-4 is not going to do that, somehow it surprises me. But that's certainly now an area I think you're going to see. Again, like you said, it's very early days. But folks who have actually been thinking about this a lot even before the LLMs, I think, feel this whole renewed sense of what they can do now.

Nathan Labenz: (1:17:54)

Certainly, one of the big lessons of the last few months is that we keep finding new ways to advance the capability frontier even just of the current system. So that has been remarkable to continue to watch. I guess maybe last question is around the law of AI. I wonder if you have any takes on should we have standard product liability for AI products? How should we think about who is responsible when AI contributes to some harm? Also interested in your thoughts, if any, on things like right to train. Should all the data that's out there be fair use in your mind? Or should there be some sort of profit sharing?

Pablo Arredondo: (1:18:45)

Alright. So for fair use, what I'd do is send your listeners to an article coauthored by Mark Lemley, who's a professor at Stanford, along with, I think, researchers at Stanford, deep dive on the fair use issue. He's infinitely more qualified to talk about it than I am. The truth is where I've intersected with stuff, courts are, I've given input to courts and to state bars that are wondering, like, do we need new rules to govern this, or can we just use our existing rules? I'm actually of a mind, at least for that aspect of things, that the existing rules, if properly applied, get you 99% of the way there. There's a duty of technological competence that's now in most states. There's a duty of candor to the court, the duty, not reading the case before you send it to the judge. You didn't need any of that to tell a lawyer that's not going to be proper lawyering. So really, the area that I've focused has been sort of myopic, really just regulation of the bar, et cetera. I look forward to taking a deep breath and diving into these really fantastic rich issues around antitrust and the right to all of these things. And certainly, Thomson Reuters, because we're international, we've just launched in Australia, in Canada. We're increasingly facing not just one, but multiple different regimes on this. So what you might want to do is have my colleague Laura on, who's our GC at CaseText. She's one of our cofounders and is now, I think, in global affairs, something wonderful like that, who might dive into the policy stuff a bit more. It will keep lawyers busy for a very, very long time. Actually, the folks at DLA Piper would be a law firm that is particularly focused on being leaders in this space across a lot of areas. They were very early with CoCouncil. They basically helped us co-develop it. But at the same time, they finished helping us with the product, then they go down the hall and actually represent AI companies that are facing these. So let me defer to better folks than myself, and I'm happy afterwards offline, I can introduce you to some folks.

Nathan Labenz: (1:20:46)

Cool. Sounds good. I'll appreciate any connections, and I certainly appreciate some intellectual modesty as well. It's not all about the hot takes. I guess one last question, if I can sneak in one more. What's it been like being acquired by obviously a much bigger, much older company? Obviously that's a highly individualized experience, but I'm particularly curious about how the acquiring company is thinking about AI. Do you have people coming to you from everywhere saying, hey, you're the AI guy, I need your help in my department? Or are they still kind of waking up to it? Being a startup guy and never having been acquired by a big company, I don't have that story to tell myself.

Pablo Arredondo: (1:21:27)

Right. Yeah. Well, one day maybe you will. Maybe you'll just go right to the public markets either way. Thomson Reuters, the reason we were acquired is because they were awake to how important this stuff is. And although they're a very long and storied company, you might think that they would be dragging their feet and kind of just relying on their legacy stuff. It's been quite the opposite. And Steve Hasker, the CEO, we announced, look, we want to bring this to bear across all of our verticals in a very deep way. And so, and this might be unlike other acquisitions maybe where you're kind of brought on and kind of siloed and do this thing. We're brought on to kind of join forces with what they're already doing, which is just how do we create as much value as possible for our clients and our customers across all of the things that we're doing. And so that's been very invigorating. It's perhaps unusual, and I don't want to tell everyone, oh, every acquisition will feel like this. But in this instance, we've been just really encouraged because we don't feel like, oh, they don't get it or they're not doing AI enough. They call it, I think, their build-by-partner approach, where they're already spending a ton of money doing their own stuff in-house. They're looking for the right partnerships, and then they're making strategic acquisitions like CaseText. Then on a personal note, there's a lot of content with law, and Thomson Reuters has the most beautiful content. It's like you've been kind of building with sticks and mud, and then suddenly you have the Roman Empire with all of its Phoenician dyes and Greek marble and all these things. And so a lot of the things that Jake and I really wanted to build were frankly limited because we didn't have X content and exploit content. With Thomson Reuters, you really have these just amazing streams of content. There's dockets or law review articles or Practical Law. So that also just feels great because now you can, basically, at this point, there's no excuses. Whatever we tell, it's on us because we really now have all the pieces, both from the content, the support of the larger organization, and fantastic colleagues. I mean, Thomson has some really great folks that have been putting their shoulder to building tools for lawyers since long before CaseText, before Jake and I were even part of the world. So it's been really encouraging so far.

Nathan Labenz: (1:23:41)

Cool. I love it. Anything else you want to touch on that we haven't got to?

Pablo Arredondo: (1:23:45)

I'll just, if you're out there and you're building LLMs, consider you an absolute hero of justice, and I just hope that you appreciate the downstream from your efforts. We're putting this to work to really make the world a better place. And I just applaud you guys. I know it is not easy to do to push these models forward, but just know that you are really truly creating value, at least from my little neck of the woods. Nothing has been more important and more valuable, and we will strive to really deserve it and to keep making the most of it.

Nathan Labenz: (1:24:14)

I love it. That's a great note to end on. Pablo Arredondo, cofounder of CaseText, now VP of CoCouncil at Thomson Reuters, thank you for being part of the Cognitive Revolution.

Pablo Arredondo: (1:24:24)

Thank you so much, Nathan.

Nathan Labenz: (1:24:25)

It is both energizing and enlightening to hear why people listen and learn what they value about the show. So please don't hesitate to reach out via email at tcr@turpentine.co, or you can DM me on the social media platform of your choice.

Sponsor Read: (1:24:40)

Omneky uses generative AI to enable you to launch hundreds of thousands of ad iterations that actually work, customized across all platforms with a click of a button. I believe in Omneky so much that I invested in it, and I recommend you use it too. Use CogRev to get a 10% discount. Turpentine is a network of podcasts, newsletters, and more covering tech, business, and culture, all from the perspective of industry insiders and experts. We're the network behind the show you're listening to right now. At Turpentine, we're building the first media outlet for tech people by tech people. We have a slate of hit shows across a range of topics and industries, from AI with Cognitive Revolution to Econ 102 with Noah Smith. Our other shows drive the conversation in tech with the most interesting thinkers, founders, and investors, like Moment of Zen and my show, Upstream. We're looking for industry leading hosts and shows along with sponsors. If you think that might be you or your company, email me at erik@turpentine.co. That's erik@turpentine.co.

Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to The Cognitive Revolution.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.