NYTimes vs OpenAI: Generative AI and the Law with Cecilia Ziniti, Founder and CEO of GC AI

Nathan and tech lawyer Cecilia Zeniti discuss IP law, fair use, and OpenAI's transformative role in the latest Cognitive Revolution episode.

1970-01-01T01:14:56.000Z

Watch Episode Here


Video Description

In this episode, Nathan sits down with tech lawyer Cecilia Zeniti, Founder & CEO of GC AI, the AI for in-house counsel. They discuss the origins of IP law in the US Constitution to promote creativity, the "fair use doctrine" and how OpenAI could argue ChatGPT is transformative, Google's patenting strategy around the transformer, and much more. If you need an ecommerce platform, check out our sponsor Shopify: https://shopify.com/cognitive for a $1/month trial period.

We're hiring across the board at Turpentine and for Erik's personal team on other projects he's incubating. He's hiring a Chief of Staff, EA, Head of Special Projects, Investment Associate, and more. For a list of JDs, check out: eriktorenberg.com.

-- ---

LINKS:
- GC AI: https://getgc.ai/
- Cecilia's Twitter/X: https://twitter.com/CeciliaZin
- Fair Use Index: https://copyright.gov/fair-use/

SPONSORS:
Shopify is the global commerce platform that helps you sell at every stage of your business. Shopify powers 10% of ALL eCommerce in the US. And Shopify's the global force behind Allbirds, Rothy's, and Brooklinen, and 1,000,000s of other entrepreneurs across 175 countries.From their all-in-one e-commerce platform, to their in-person POS system – wherever and whatever you're selling, Shopify's got you covered. With free Shopify Magic, sell more with less effort by whipping up captivating content that converts – from blog posts to product descriptions using AI. Sign up for $1/month trial period: https://shopify.com/cognitive

Omneky is an omnichannel creative generation platform that lets you launch hundreds of thousands of ad iterations that actually work customized across all platforms, with a click of a button. Omneky combines generative AI and real-time advertising data. Mention "Cog Rev" for 10% off www.omneky.com

NetSuite has 25 years of providing financial software for all your business needs. More than 36,000 businesses have already upgraded to NetSuite by Oracle, gaining visibility and control over their financials, inventory, HR, eCommerce, and more. If you're looking for an ERP platform ✅ head to NetSuite: http://netsuite.com/cognitive and download your own customized KPI checklist.

X/SOCIALS:
@labenz (Nathan)
@CeciliaZin (Cecilia)
@CogRev_Podcast

TIMESTAMPS:
(00:00:00) - Episode Preview
(00:04:30) - Introduction to Intellectual Property law
(00:07:30) - Overview of main types of intellectual property - patents, trademarks, copyrights
(00:10:30) - Could Google have patented the transformer? Google’s patent strategy
(00:16:30) - Claim construction and its parallel to LLMs
(00:17:17) - Sponsor: Shopify
(00:18:46) - Practical impact of open source on ability to enforce IP rights
(00:21:47) - Litigation, internal data privacy, and communication
(00:24:30) - Copyright law and fair use doctrine for transformative works
(00:30:22) - Sponsor: Netsuite | Omneky
(00:31:42) - Big tech company lawsuits when money starts shifting between incumbents and newcomers
(00:34:10) - When judges like a technology…
(00:34:57) - Scraping and Linkedin’s suit
(00:38:50) - NYTimes vs OpenAI
(00:40:00) - Copyright, creators, and music industry licensing
(00:42:35) - Transformative nature of work and “fair use”
(00:48:39) - Middle ground for the NYTimes and OpenAI case
(00:50:29) - Limits on getting injunctions and ongoing royalties from courts
(00:56:39) - Legalities of other modalities, like DALL-E images of copyrighted IP
(01:01:00) - OpenAI’s opt-out for licensing on the text modality
(01:03:58) - OpenAI’s weak to strong generalization
(01:07:00) - Pharmaceuticals and software, AI doctors, public good balancing
(01:10:40) - Why licensed professionals aren’t pushing back against AI
(01:15:36) - Duty to use tools for efficiency to a client
(01:17:04) - Impact of open source and ability to find someone to sue
(01:20:30) - Forward-looking thoughts on what IP laws should be for AI
(01:27:38) - Overview of Cecilia's company GC AI for lawyers

The Cognitive Revolution is brought to you by the Turpentine Media network.
Producer: Vivian Meng
Executive Producers: Natalie Toren, and Erik Torenberg
Editor: Graham Bessellieu
For inquiries about guests or sponsoring the podcast, please email vivian@turpentine.co



Full Transcript

Transcript

Cecilia Ziniti (0:00) The United States' strong intellectual property protection regime is what gave rise to Hollywood, what gave rise to Silicon Valley. Oh, hey. I trained, my own GPT on all your work, and it's been so helpful. And she's like, okay, thanks. And it's sort of like, how am I supposed to feel about that? Maybe it's not verbatim copying and who knows what it outputs. But even if it rewords the phrases, should there be compensation for that? Of course, in that process, they're going to defend and say, no, it's called the general purpose transformer. The intended use of this device is like a VCR for your own personal use. My prediction is actually that there ends up being a commercial solution to that where there's some startup that's like, hey, GPT yourself. And then it's literally like, you can get her knowledge from there. They're going to figure out AGI. They can figure out and not have infringement. And there's people that have said that they're using GPT-4.5 to actually do the copyright analysis.

Nathan Labenz (1:01) Hello and welcome to the Cognitive Revolution where we interview visionary researchers, entrepreneurs, and builders working on the frontier of artificial intelligence. Each week, we'll explore their revolutionary ideas and together we'll build a picture of how AI technology will transform work, life, and society in the coming years. I'm Nathan Labenz joined by my cohost, Erik Torenberg. Hello and welcome back to the Cognitive Revolution. My guest today is Cecilia Ziniti, founder of GC AI, an AI assistant for in house counsel, previously general counsel at Replit and lead counsel on the Alexa team at Amazon, among other roles, and author of some of the most helpful viral Twitter threads that I've read about the legal landscape surrounding generative AI broadly and about the potentially blockbuster New York Times versus OpenAI case in particular. It's safe to say that legislators from the founders of The United States who I learned from Cecilia did specifically provide for the protection of intellectual property in The United States constitution. 2 of the more recent congresses that have extended copyright protections and otherwise modified intellectual property law did not anticipate a technology like generative AI when writing the rules that we live by today. And so it falls to the courts, for now at least, to figure out how to apply an established legal paradigm to an emerging technology paradigm. To be honest, aside from filing a couple of provisional patent applications over the years, including 1 just last year, which I wrote with the help of GPT-four, this is not an area that I've studied much at all myself. So I brought a ton of questions to ChatGilla and proceeded from the most general and basic to the most specific and nuanced. Why

Cecilia Ziniti (2:45) do

Nathan Labenz (2:45) we have intellectual property in the first place, and what are its main branches? What legal frameworks apply to generative AI? What rights do content owners have? Does the New York Times have a good case? Is it really possible that the court could order GPT-four to be destroyed as the New York Times demands? How likely are the courts to rule on this question at all versus trying to force Congress to act? Are there any notably good proposals for generative AI law that are not already broadly known? And how will The US position inform or perhaps be challenged by different choices that other countries might make going forward? We cover all this and more in this conversation. As always, if you're finding value in the show, we appreciate it when you share it with friends. This episode, quite naturally, would be for the lawyers in your life. Now I hope you enjoy this conversation about the relationship between the law and generative AI technology with Cecilia Ziniti. Cecilia Ziniti, technology lawyer and founder of GC AI at GetGCAI Online. Welcome to the Cognitive Revolution.

Cecilia Ziniti (3:52) Thanks, Nathan. Glad to be here.

Nathan Labenz (3:54) I'm excited for the education that you are about to share with me and with the audience. So many of us are AI enthusiasts, AI builders, AI researchers, and we of course spend some time thinking about how is this going to impact the world. And that often gets played out in very positive visions of the impact we want to make. And certainly there's the worries too of the risks that we want to avoid. I honestly haven't spent a lot of time personally, and I suspect that's probably true for a lot of folks that listen to the show, thinking about some of the practical but super important interactions between these new AI technologies and something like the law. And obviously, this has come to the fore a bit recently with increasingly a ton of lawsuits being filed, the most notable of which is The New York Times versus OpenAI, which has kind of got the counterparties to create a case that seems like it could be a big precedent center and could be referenced in history books for many years to come. I want to get into all this with you, but I would love to start if you would indulge us as people that are, at least speaking for myself, very much a beginner when it comes to questions of the law by just asking for a kind of introduction to intellectual property law. Like, why does it exist? What are the big sections of it? What are maybe some of the live issues? And then we'll get into kind of how AI enters the scene and how that changes things perhaps. But a good kind of brief foundation I think would be super useful.

Cecilia Ziniti (5:29) Of course. So intellectual property law, you're right to ask about it because if you look at the rise of the big tech companies, they're not big property holders, right? The property that they hold is intellectual. It's software. It's marketing. It's brand. If you look at literally the Fortune 20 in 1980, there was not a single tech company in there. And now I think it's for the 5 with, you know, the magnificent 7 or whatever in the top. But in terms of the law, it's been long recognized that, intellectual property is a important to encourage. It literally goes back to the constitution. So not a lot of people know this, but the constitution in the article that created congress, article 1. So article 1, section 1, clause 8, talks about, to encourage or to promote the progress of science and the useful arts, Congress shall allow for a limited time monopolies for authors and inventors. And so that's the IP clause and that literally gave rise to the patent office, the trademark office. And a lot of people say that The United States' strong intellectual property protection regime is what gave rise to Hollywood, what gave rise to Silicon Valley. And so that's kind of the basis for it. And then in terms of the types of intellectual property, we can dive in if you'd like.

Nathan Labenz (7:00) Yeah. Okay, cool. I did not realize that it was that high up in the constitution. So when I think of this, at least 3 terms come to mind. Patents, which I understand to be kind of around invention, copyrights, and trademarks. I don't even venture to try to distinguish between those final 2. Are there any other classes that I'm missing and how should I kind of conceptualize those 3 buckets?

Cecilia Ziniti (7:29) Yeah, yeah. So trade secret is the other typical fourth category considered. So the famous example is of course the formula for Coke that, you know, 2 people that know it can't fly together and it's very, very deeply protected. But essentially, those 3 buckets of, you know, patents, trademarks, and copyrights, those are going to cover sort of most of what you think of as software and what are and what you you think about AI and the production of LLMs. So we can dive into each of them. I think they each have their own nuances. They each are, created in slightly different ways and protected in slightly different ways.

Nathan Labenz (8:05) Particularly, I always think about inputs and outputs for AI systems. I'm kind of interested also in what do you have to have or create to sort of be the input to then have these rights? Then on the other end, what are the special rights that I have in virtue of having these things?

Cecilia Ziniti (8:25) Yeah. Yeah. So distinguishing again the 3 types of IP and then we'll jump into copyright because I think that's where the New York Times suit is and that's where it'll get kind of juicy for your listeners. Patents are a pretty formal process. So you actually apply for the right to exclude on an invention. And the patent itself has what's called claims that lay out the steps for whatever it is. And it's it's called prosecuted, but it's very heavily scrutinized at the time that you file. So you file for a patent on a given technology and then, you know, the patent office, you have some back and forth called office actions that goes into that. But it's not you don't kind of automatically get it. In contrast, copyright literally attaches on creation. You don't have to do anything. Like you can literally decide, want to write a poem about my kid. Grab your pen. Start writing. Congrats. You have a copyright. It's when it's actually in, in a fixed medium of expression. In your head, not a copyright yet. In a fixed medium of expression such as typed or written out, you likely have a copyright. Now if you just, from memory, were to write out the lyrics to Taylor Swift, you would not have a copyright in that because she previously has a copyright in that particular expression of that song. But if you were to rewrite, you know, a Taylor Swift song and instead make it about potato chips, again, you're copyrighted.

Nathan Labenz (9:47) Very interesting. So on the patent side, this is not pertinent so much to the main topic for today, I guess for 1 thing, it's often common that the patent system is broken and we certainly hear stories. I don't know how really common they are or if this is the dominant narrative or just selective storytelling. But you certainly hear this idea that the patent system is broken and the big companies that have the resources to do it, they're going out and patenting all kinds of things very well in advance of actually being able to do it. Then they can squat on them or protect themselves or what have you. But 1 thing that's very striking about the current AI moment is Google invented the transformer and either they didn't patent it or they patented it in such a way that guess I'm kind of embarrassed that I don't know this, but everybody else is using it, right? So could they have patented the transformer and just chose not to?

Cecilia Ziniti (10:44) So they could have patented the transformer model. So software patents, little bit of a debate. There was quite a reigning back in of software patents, under a line of cases called Alice. But essentially you still can patent software processes that have a specific, result and that are embodied in a particular way. According to Twitter, of course, my source for legal research, not really, but I did check, the transformer model was patented by Google. And but for them enforcing it, it's kind of a losing game for them to enforce it because they're the platform. They want everybody developing. And so their patent strategy is much more defensive. So there were the patent wars, Google bought Motorola and, you know, the smartphone patent wars. I build many hours of my life on those cases. But essentially, you know, they're kind of like, they're sort of like a detente reached. So little known fact, every Android implementation, $10 go to Microsoft. And most people don't know that, but literally every Android phone, there are so many Microsoft operating system patents that the deal that was reached was that that would be the case. So if you're, you know, a Samsung or, you know, any device, you know, Android device maker, it's actually a question whether you trigger this particular licensing fee. It's a big deal because it's an element of the BOM, the bill of materials for a piece of hardware. And so essentially, like, yes, could Google go after people for their huge Google is 1 of the bigger patent filers and patent holders. But in general, their strategy has not been to go after people. They use them much more defensively. So for example, Sonos had a big series of patent cases against Google. And Google, of course, had a bunch of patents that it could assert against Sonos. So that litigation went on for, I think it's been 7 years, 8 years. And, you know, if I'm Sonos, I don't know that it was wise to sue Google in that scenario to your point, that it's like the pockets are really deep, the portfolio of patents that Google has is really wide. And so chances are even a great result would be some kind of cross licensing.

Nathan Labenz (12:58) Yeah, interesting. So, I mean, this would be If there was 1 moment, right, that would maybe challenge that detente, like this might be the 1, right? You could imagine a scenario where Google would say, Hey, yeah, don't be evil, all that stuff. We kind of semi retired that anyway. But OpenAI and Microsoft have gotten ahead of us here in a way that we really can't allow. And it might disrupt search, our core business, and we own the patents on this. Let's go shut them down. If they were to do that, it seems like they would have a chance of winning, right?

Cecilia Ziniti (13:28) It's complicated. So I mentioned that the $10 that Google pays Microsoft, the amount of Microsoft patents that Google reads on is potentially infinite, right? So every single Google service probably has a basic file system, for example, and Microsoft's patents go super deep in file systems. But more it's the PR thing, right? Can you imagine Google files a big patent suit against OpenAI? Talk about Code Red. So Code Red was last year when ChatGPT came out and everybody at Google freaked out and they really marshaled the company in this direction to move faster and then Bard and Gemini and so on came out of that. But if they sue that, if I'm their strategist, so not just their lawyer but their strategist, I'm thinking the press on that would be old line Google in its last throws tries a legal mechanic against OpenAI, right? So it wouldn't play out great that way. Could they? Yeah. I mean, on the OpenAI side though, there's a concept and your readers might find this super interesting. You can design around a patent. So remember I mentioned that there's the claims, so very specific step 1, step 2, step 3 to do this. And if you, skip 1 of those steps or you do something a little different, you potentially don't infringe. And that infringement analysis, you know, that requires a court case itself. So 1 fun story about Google, I actually worked at, my first job in a legal team was at Yahoo in the early 2000s when they were still competing with Google, right? So Yahoo stock was going up $2 a month. I was very happy. I was able to pay for law school from that. And they were still competing in search. So Chilu and some other really big industry luminaries that eventually, Hadoop came out of Yahoo. Yahoo doesn't get great credit for its tech, but it was good. But in any event, Yahoo bought a company called Overture, which had literally come up with sponsored search. And they had a patent on sponsored search, which is literally the entire Google business model. And so there was a suit on that, that I went to a couple of hearings for. And the issue was that the patent is if you pull it up, you can Google it, not Yahoo it, but you can Google it and find it. The patent says it's we have a patent on or, you know, the claim what is claimed? That's how the patent reads. It's very formal. What is claimed is a system of method wherein placement and search is quote determined using bid amounts. So determined using Google's defense was we don't determine your placement in search using bid amounts only. We also look at relevance. We also look at the name brand and they had all these things and it basically said, no, no, no. Our algorithm is like way more complicated and interesting. It's not practicing this patent because we determine it using a plurality of factors. So that was kind of a legal thing. And that case settled. Litigating on that point, really tough, right? We were analogizing to GPAs, right? Okay. If the valedictorian is determined using GPA, but then if you play a sport, you win. Okay, is that infringing? So you get into these really things that kind of almost match LLM's where each word matters. There's a whole process in patent law called claim construction where the parties basically agree on what the definition of each specific word is that they're going to use, go forward. So yeah, patent law, you know, similar to fair use under copyright law, which we'll get into, is expensive and unpredictable to litigate. That's the theme you're going to see in a lot of these kind of legal solutions is that when a system is unclear or gray or based on the meaning of specific words, it becomes more of a challenge to litigate and more of a risk.

Nathan Labenz (17:12) Hey, we'll continue our interview in a moment after a word from our sponsors. Yeah. Fascinating. Okay. So there's the law and then there's the public relations question. Then there's also kind of the game theory dynamics of like, if Google starts suing Microsoft, Microsoft can sue them right back on 1000000 different things as well. And then everybody is kind of a mutual assured destruction. I don't know if this varies across these different types of intellectual property, but to what degree When I have a monopoly, does that mean that basically I have total rights to choose who I deal with and how? No I guess it's not an interesting proposal the other day where somebody said, There should be mandatory licensing where you can't hold your invention totally exclusively, but you should be able to profit from it, but you can't tell nobody else that they can ever use your good idea. But as I understand it right now, you get to decide as the owner, right? You could just refuse outright. I just don't want to deal with these folks and that's that.

Cecilia Ziniti (18:18) Yeah. I mean, are some kind of statutory exceptions with respect to drugs. So there's a whole regime around generic drugs and the amount of head start that the original patent holder gets and how they do that. Or sometimes the whole open source movement, Google has open sourced a number of patents as have most of the big players. It gets back to that constitution line, right? It says exclusive rights. So it's very similar to real property, right? Like if you have this plum piece of land next to the county fair, you don't have to make it a parking lot. You can, you might make great money, but you don't have to. Now there is a process called eminent domain where the government could potentially come in, but that's really contra to the original constitution. Another kind of fun fact, it was life, liberty, the pursuit of happiness. The original text was life, liberty, and property. And so very serious thing and role that the government plays in giving holders of both real and intellectual property, the right to really capitalism. Like they can do it, they cannot. They can do it badly. They can be selective. Now there are some, like the Civil Rights Act and so on about like serving people at your restaurant. There are rules on that. But in this case, how Google decides to commercialize or not the transformer model, it's a strategy question and they took 1 path. Seeing really OpenAI commercialize it as they have and had just such a breakout hit so quickly, it sort of indicates that what other gems is Google sitting on? I have the same curiosity as you do.

Nathan Labenz (20:00) Interesting questions about discovery too. My understanding is if this were ever were to get litigated, then would the records become public? Is it the case that the only reason that we originally knew why Google or how Google was ranking its search results and all these different factors was because that came out in discovery. I wonder what Google might learn about what OpenAI has got up their sleeves if they got into a courtroom and everybody had to tell all their secrets.

Cecilia Ziniti (20:30) Yeah. So, I mean, Google's definitely not a stranger to discovery and they've been under definitely antitrust scrutiny for a number of years. And, you know, I've been an in house lawyer. I was at Amazon for almost 4 years. You know, we train folks internally that really, you know, the New York Times test literally what, you know, what you write, would it withstand the front page, both from a PR perspective and a legal 1? But in terms of actually the litigation process, it's 1 of the things that I think Meta actually got called out about in that I think it was Francis Hagen, but there was like this whistleblower suit a couple of years ago was, you know, they essentially used Facebook internally. Cool, dogfooding. But anybody could sort of go into any it would be like open completely public Slack channels. And a company of that size, like, there's a lot of info you can have there. And they've since, you know, locked that down, but that's how that the, Frances Hagen was able to actually pull down different research studies that she eventually presented to congress was because of this open system. So that being said, I'm pretty pragmatic about it in my legal advice in house in the sense that information kind of wants to be free. I've been at companies where we had almost completely public Slack channels and from an engineering standpoint too, you can kind of scroll, you can get the history, you can understand. So that's the benefit that you're weighing against the risk of like, yeah, if you get in a litigation then it's like free for all and you see things. Lots of fun examples there. All the big tech companies were sued by the California attorney general over app stores. And some of the discovery there was like, don't ever give refunds, this is a great revenue stream and kind of like just pretty bad statements internally that led to suits. So a lot of it is also just communicating smartly.

Nathan Labenz (22:21) Very interesting. Okay. So let's then jump to the copyright side. So you've said that as soon as I make something, I immediately have copyright in it. This falls under the same exclusive situation, but yet I guess a couple of questions I have on that. 1 would be like, is this exclusion? Does it matter if I'm getting paid for using somebody else's copyrighted material? And also, I feel like I see cover songs all the time that are not necessarily signed off on, but seem to happen. So give me a little bit more kind of a rundown on the copyright side of this.

Cecilia Ziniti (23:03) So your question on is copyright a commercial right? Is it another kind of right? People do look at it as a personal right. So in France, it's actually called like a drois morale or there's a French word, but essentially that as a creator, I have this kind of really ethical right to exploit my work and to be the 1 to exploit it. But how you do that, the metaphor that they teach in law school, which I think is a good 1, is that an intellectual property right is a bundle of sticks. So as an example, for a movie, you can separately license the rights to play the movie on the plane, the rights to play the movie on streaming, the rights to play the movie on cable, the rights to play the movie overnight, like the rights to play the movie in Italy, like whatever it is. There's infinitely many ways that you as the owner of that right can decide to split it up. And you see that, right? Like Netflix has a huge licensing department that writes big checks. Your Hollywood studios have, they call it, I think they call it, business affairs, but legal and business affairs. But essentially that's the team that thinks about how to split up these rights. And then there's merch and all kinds of downstream stuff. So that's the right that the copyright holder has. But there are limits on that and fair use is a really important 1. So fair use, kind of interesting. It's not a defense. It's not like you're infringing and then you claim fair use, although in practice that's how it ends up happening. Fair use is actually beyond the reach of the copyright. So the exclusive right in the first place doesn't stop people from using your work in a way that is deemed fair use. Now how you decide that, it's 4 factors. It's squishy, and we can get into it, but that's the other, important limit on copyright.

Nathan Labenz (24:46) Okay. I'm not sure I fully understand the distinction between you're accused of violating copyright and then you defend yourself with this versus it doesn't extend to that in the first place.

Cecilia Ziniti (24:55) The reason it matters in practice is that, you you say it's not infringement and it's kind of like, I guess it's a defense, but procedurally it's not like, oh, I did it. I infringe your copyright. And then you get back. You get back to like, no, no, wasn't infringement because it was fair. And so the 4 factors, and by the way, they're not exhaustive. The test literally says, may consider these factors and then can consider others. So it's squishy and it's a balancing test. So whenever you have balancing tests, it's like, oh, on the 1 hand, on the other hand. And it's kind of like, it's not like, I don't know, provisions of the tax code where it's like x percent of your income and it's very it's relatively clear. Copyright is is known as 1 of these areas. It's a little more gray and that kind of makes sense because it's creative expression. But in terms of the factors, let me pull up. I wanna make sure I'm precise for your folks. I'll pull up the factors now. But it's basically, the purpose and character of the use. So how is the alleged infringer, in this case OpenAI, using the work? Is it for profit? Is it, transformative? There's a bunch of sub questions under that. And that first factor is the 1 where there tends to be the most play in technology cases. That's factor 1. Factor 2, the nature of the copyrighted work. So is the work that is being infringed or allegedly being infringed, is it creative? Is it the kind of thing that copyright wants to encourage and society wants to encourage? New York Times here, they would say, yes, of course it is. You know, we've got news articles going back to 1851. We've got news articles that literally we discover things that lead to criminal prosecution. We've got 50 to 100,000,000 readers a week on our work. It's very creative. This is the type of work that copyright wants to protect. So that's that second factor. You'd probably have a situation where OpenAI would most likely admit that. It's not like the New York Times is a phone book, which is the 1 case that found that, okay, phone books are not super creative. There's only so many ways to list phone numbers and names. It might be copying that as a commercial problem, but it's not a copyright problem. And then amount and substantiality of the portion used is factor 3. So how that plays out is, do you need to use the whole work or can you use bits and pieces of it? Usually this plays out in like critical work or in sampling cases or, you know, if you are doing a critique, do you need to show a whole movie? No, you can just show clips of it and do your critique. So that's that factor. And then finally, and this 1 is gonna is gonna be important in this case, the money, the effect of how the infringer uses the work on the market and value of the original. So if it's a 1 to 1 substitute, in other words, people use ChatGPT and they cancel their near time subscriptions because they can get full articles, that's much less likely to be fair versus if it's a complementary use or if it's not gonna affect the ability of that original rights holder to do what they want with their bundle of sticks. Make sense?

Nathan Labenz (28:10) Yes. Although I'm not necessarily ready to judge the case.

Cecilia Ziniti (28:13) If you want in the show notes, I can also give you a fair use cheat sheet or something like that. I don't know if people would like that. There's also a, might actually be helpful for folks. The Library of Congress maintains a database of fair use cases, and they've organized it really nicely. They've put in summaries so you can literally look things up, you know, tech case that found fair use or tech case that didn't. And the copyright office tries to give, you know, again, encouraging authors to be creative, gives a lot of guidance on on fair use that you can check out.

Nathan Labenz (28:44) These sound like great resources. Hey. Continue our interview in a moment after a word from our sponsors. What would you say have been kind of the the most recent wave of battles between, let's say, technology and other, you know, kind of constituencies, rights holders across society just prior to the current wave of AI stuff? Like, where what where have the the fault lines or the, you know, the the battle lines been drawn just before now?

Cecilia Ziniti (29:12) Big technology shifts tend to have lawsuits that go with them. And part of the reason for that is that you really see money moving. So, you know, I talked about, you know, when I was at Yahoo and at the time online advertising was still pretty new and the share of wallet of a given advertiser, let's say, Motors or Ford advertising cars, they were still doing it in print media and TV. And now, you know, that has shifted. So when you see shifts like that, it's like, okay, who's losing money? They're probably going to sue. Who's making money? They're probably going to get sued. And so this OpenAI suit is a pretty classic example in that paradigm. Others, you know, so I was involved in the Apple Samsung case, and basically Apple came out with the iPhone. It was such a, like a step function leap from the flip phones and, you know, I guess dumb phones. But in any event, the ability to actually, you know, scroll on your screen and access the web in this intuitive way. You know, Samsung very quickly, came out with their own version and that was a big IP suit. And so, you know, when you that's what I would sort of look for in terms of trends. Now the legal mechanics of how it happens, you know, tend to tend to differ. Digital music is another great example, also involving Apple, the scraping cases and and the kind of the Internet at large. Google's trademark lawsuits are also another great example. So it was not a settled question of law for a long time, whether if you search for acne, acne's competitors could buy that keyword. For a while, there were a bunch of lawsuits about that. Google won them all. And basically now that's a pretty settled area of law. But essentially Google's argument was like, well, you know, we're presenting information. It's a different kind of use. And the other fun factor is that judges really like using Google. I really like using Google. As a lawyer, it's 1 of like the best tools. Based on that, my view, and this is I guess a little bit more in the political economy space, is when judges like a technology, they kind of find a way to rule for you. The market finds a way. I'm pretty optimistic in that sense.

Nathan Labenz (31:27) There's always this I find this fascinating and I'm, as you can tell, very novice in all of these different aspects. But do find the question of, is the law really the law or is it sort of this meta situation of figuring out how we want to interpret the law to do what we want to do? I think that's always a really interesting question. So you mentioned scraping. That's starting to get pretty close to the current question, right? My understanding is that if stuff's out there on the web, you're broadly kind of allowed to scrape it to a first approximation, like LinkedIn lost some cases I understand against folks who were scraping LinkedIn profiles and whatnot.

Cecilia Ziniti (32:11) Yeah. So this gets to like copyright and commercial rights are kind of different. So LinkedIn has in their toss presumably that you can't scrape by automated means in such a way that it takes down their site. Right? And, you know, it would be a violation of the toss. That being said, we do have this case law around, you know, use for reverse engineering is okay, and then facts themselves are not copyrightable. So that great telephone book case I mentioned, Feist versus Rohr Telephone, went all the way to the Supreme Court. And the copier won in that situation and said, okay, there may be other things, but in this scenario copyright was meant to promote creativity. And it's not meant to promote labor, just the work of putting together a phone book. It's meant to promote the creativity. And so that's the line of cases that LinkedIn was able to effectively lost on. But there's other ways now, right? Like how you calibrate the scraping if you're actually going around a paywall. And 1 of New York Times' arguments here is that they very carefully calibrate where to put the paywall. Maybe you get a guest article, you get a certain number. You know, maybe if you click from Instagram, you get the article for free. But then if you try to reshare it, you know, like, and that's their right. Like, this is really a property right. Getting back to that, you know, carnival parking lot example, I can decide to just have my parking lot open on Sundays and that would be my right. And here the property holder in New York Times says, I get to decide where people and how people see my content. And so, so that's how it kind of relates. In the scraping case of LinkedIn, you know, the fact that my name is Cecilia Ziniti and my title is founder and CEO of GC AI, like that's just a fact. And so even if somebody discovered that through LinkedIn, LinkedIn would not have a claim on that.

Nathan Labenz (34:08) Then maybe let's start to try to separate some of the possible issues of what's going on with The New York Times and OpenAI, because I could see different kinds of complaints that The New York Times might have. A few that come to mind are, 1, you took all of our stuff and loaded it in with a bunch of other stuff and trained your models on it. And we may or may not like that. 2 would be, we've seen these examples of ChatGPT outputting very close to verbatim, essentially plagiarizing an article. I guess I don't know if it would count as plagiarizing if it has been effectively attributed to The New York Times, which I'm not actually immediately sure whether those examples did say this is like a New York Times article or not, but it didn't even. It was able to essentially have memorized and, as OpenAI has called it, regurgitate the full text verbatim or very nearly verbatim. So they might not like that. And then I guess the third 1 would be they now have these browsing capabilities where ChatGPT can go out and access stuff on my behalf and pull it back. And so now we kind of have Google like questions or maybe even Facebook like questions of, Well, what's the right way to do that? Should they be able to show the whole article? Should they be able to just summarize the article? Should they be able to give you the headline only and link out to it? So is that a good organizing framework for the issues here? Am I missing any? And which ones are really kind of the core ones that are at issue?

Cecilia Ziniti (35:44) Yeah. No, you separated it perfectly. And in fact, that's both how the law OpenAI and people have been thinking about it. Immediate 1 of the reasons the New York Times case I thought is stronger than others is that regurgitation issue. So they hired, you know, somebody to go and look and get these verbatim articles and they have a deep enough library that they got it. So that 1 is much more clearly infringement because if you look at the factors, it's like, okay, you're using a whole, you're a possible substitute. It's really taking away that control from the copyright owner and it's not really transformative in the sense that somebody wanted the original article. They could go to New York Times or they could in this as shown on that thing, they could go to OpenAI. So that's kind of this regurgitation issue and OpenAI has categorized it as a bug, a singular bug. Now I don't know. I'm not an engineer, but that seems a little weird to me. But in any event, they're trying to show it's an isolated thing that they are fixing. I would expect in discovery them to come out and defend on that. But that issue per OpenAI, and then I think you could say reading the complaint, it's not really the core issue. The core issue of training, the 1 that I think the reason this case has caught so much attention and I think could be a watershed, is really the training. Like, I talked with creator over the weekend. They've got a podcast and they're sort of like a relatively famous creator. And they said that somebody they've had 3 or 4 people email them and say, Oh, hey, I trained, my own GPT on all your work and it's been so helpful. And she's like, Okay, thanks. And it's sort of like, how am I supposed to feel about that? Like, maybe it's not verbatim copying and who knows what it outputs. But even if it rewords the phrases, if it's the business advice that this person would give, should there be compensation for that? And that's its own question. And copyright law is not a perfect fit because of what I said. But the 4 factors, you can figure out how you might, you know, it'll be fascinating to see how New York Times and other creators argue it. I do think you mentioned the the idea of music covers. The music industry is actually a great, analogy here because there's a very clear and robust system for how you license music. So Weird Al, who does, like, funny things like songs where he takes very popular songs and makes them about potatoes or something, he actually licenses all his music because he's like, I don't want to mess around. I'll just pay. I know it'll be a good work. I'll just pay the original. And that's what he does. So, you know, that gives you evidence of somebody very, a very sophisticated creator electing to pay and because there's very clear system. So that was also the benefit of iTunes. ITunes came out and it was like, okay, we knew that Napster sort of felt, and I was talking with Jason Kalicana last week and he's like, yeah, it felt like stealing. Was it felt that way to me. And that's where you get that balancing and those kind of like public good concerns. Here, it doesn't quite have the same feeling as Napster, but it doesn't quite feel like iTunes either. Maybe an iTunes will emerge where my friend, the creator, somebody creates a GPT of all her work and it's helpful to them. You know, should she get paid in some way? Should there be some kind of Google result where it's like, you can do this? Maybe. And I think that that could be what emerges.

Nathan Labenz (39:18) Yeah, fascinating. Okay. So it's safe to say the law was not written with any anticipation of these new technologies. And maybe in a few minutes, we can kind of turn toward what the law should be, or there's always the possibility that Congress could do something. But for now, going back to 1 of the fair use defense, the main 1 here, as I understand it, is just the transformative nature of the activity, right? And it's funny, the resonance, I don't think this was planned either, but the transformer being the architecture that is commonly in use.

Cecilia Ziniti (39:56) Exactly. No. Names matter. Right? I mean, think of think of Uber's god mode and, you know, the privacy folks got really upset about that. And when I advised engineering teams internally, you know, people are like, oh, we're gonna come up with a product called Eye of Sauron. I'm like, yeah, let's not put that in the code. It's like what advise. So yeah, these kind of terminology and advocacy, like the fact that it's called transformer is kind of a nice fact. 100% agree with you.

Nathan Labenz (40:22) So it seems like it is To say that this is a transformative use, on the face of it seems pretty clear to me aside from I mean, there is this regurgitation issue, which I certainly don't think that's how most people are using ChatGPT is to try to get it to spit out old New York Times articles. So it does seem like there's a pretty just man on the street, Hey, we took all this text and we made this artifact out of it. And this artifact can do that. Would you call that transformative? I would say the vast majority of people would be like, Yeah, that's clearly transformative. Somebody will dispute everything. So who is disputing that and how are they disputing that?

Cecilia Ziniti (41:05) So that's all that's all right. And and the legal analysis, you know, you're you're you're half a lawyer already just just having explained that that way. So good job. But, you know, I think what's the the fair use analysis it looks at is there a new purpose, meaning, or message in the thing. So classically, you know, a critical article, of a movie. I'm a I'm a I'm Roger Ebert, and I'm writing a critique of a movie, and I have a couple of lines. You know, that's clearly a different a different purpose. You're reading to evaluate the movie, different really meaning and different message. So Roger Ebert is good. In terms of GPT and it being a different way to use the work, obviously that's going to be a huge percentage of their case is really pushing on that and how it's different. Where New York Times would potentially have holes in that is, okay, but you have to implement measures to where it won't be a substitute. And this is where OpenAI will say and have said in their response in the blog post that, you know, we're working on fixing this bug, the regurgitation bug. I would be curious to see stats on it. We, have different content measures that prevent users from asking for the copies. So this is the equivalent. Like, if you go to Kinko's with something, you actually sign a piece of paper or you click in their online interface that you have rights to copy whatever it is. And there are cases on that and OpenAI would say every single user signs our terms of use. Our terms of use say that if you do a bunch of fancy prompting to get regurgitation, that's a violation. And they would both have to have a terms of use protection, but then they would also have to have technical ones. Right? So Napster never actually got the copies of the music, but they knew that that's what was going on. They knew that copyrighted music was most of the file sharing. It wasn't sharing, you know, open source notes or anything like that. It was really, if you looked at a Napster server at University of Illinois in 1999, it was all Soundgarden or whatever, like musicians from their Metallica as the case may be, right? And so that kind of knowledge of potentially contributory infringement or other kinds of infringement is like, what is the technology provider doing? Now you could end up with a congressional solution. So the DMCA, Digital Millennium Copyright Act, it was supposed to deal with the fact that like, if I can post pictures online, which of course I can, you know, early sites like Geocities, Shutterfly, whatever, I can actually put up pictures and they are potentially infringing. DMCA says, okay, you're the photographer, send a note to Shutterfly, get it taken down. Send a note to GeoCities, get it taken down. And there's a whole kind of process for that. We don't have that yet for LLMs. And so you might expect that something like that would emerge for it to be fair, it to truly for OpenAI to truly be able to say, like, we're not you know, copyright infringements are our intent. We're not intending to be a substitute. If you're a rights holder, you can do x y z things to get your content either pulled out or some kind of compensation system like the music folks. That that's gonna be where the play is. But, of course, in that process, they're going to defend and say, no, it's called the general purpose transformer. The intended use of this device is like a VCR for your own personal use. So that's how they would go on that.

Nathan Labenz (44:36) So is there any in between right now, for example, the emergence of Spotify, we now have these deals and I guess they could always make a commercial deal between them, right? Like the New York Times and OpenAI could say privately, Here's our deal. We'll pay you X. You'll not give us a hard time anymore. But if they don't want a deal, is there anything between OpenAI winning and it being like, Yep, this is fair use. You're cool. Or on the other end, it's not. I guess the And downstream question would be, then what happens? Because they've made some pretty aggressive demands around the destruction of models, right? So is there any other legal in between?

Cecilia Ziniti (45:16) Yeah. So the law is actually relatively adaptable in that sense. So as you mentioned, New York Times had different claims, so different counts of the lawsuit of like, you did this, you did this, you did this. That is typically kind of like bifurcated and analyzed directly. And then, you know, it will take time, but, you know, it's very common for 1 issue to go up to a higher court. And then that court, it's called remand, but basically send it back down to look at the specific facts. So procedurally, can it can the law handle these sort of shades of gray and different accusations? Yes, it can. In terms of what will happen and is there a possible middle ground? Yes, I think there absolutely is. Like you could have a scenario where the rule comes out saying, okay, regurgitation infringement for a used to be fair, the model provider has to take reasonable measures to prevent regurgitation, something like that. And then the question becomes, okay, what's reasonable? Do you, is there some kind of industry standard that emerges? So that's 1. Another way it could go is, you know, they could absolutely settle. They could have damages going backwards. So GPT-3.5, which I think was a little bit worse on regurgitation than GPT-four, you know, it could be like, you know, that model, we get some kind of royalty for that or, you know, there could be infinitely many solutions just like there's like, you remember the bundle of sticks I talked about? Like, it's a really helpful framework to think about like, okay, where do we put the line? How many sticks does New York Times get to keep and which ones are they going to rent out? Which are they going to pay for? But what's hard about this is that, you know, we know and I think a lot of us in technology believe that generative AI is absolutely going to be incredibly lucrative, trillion dollars. You know, I think BCG says it's like 1000000000000 dollars of productivity is coming. That being the case, any deal you do now, are you kind of like selling the baby because New York Times wouldn't get the eventual value? But, you know, it's an interesting question that I don't have an answer, but I do think a middle ground is very possible. I disagree with some commentators that are like, Oh, it's the end of OpenAI. No, that cat's out of the bag. That would be very unlikely to get any kind of full injunction at this point.

Nathan Labenz (47:35) But there isn't anything that's like, You can do this, but you have to pay. That would be between the parties.

Cecilia Ziniti (47:42) Yeah. I mean, concept injunctions themselves are also considering 4 factors. And 1 of the factors is, is it a harm to whoever's seeking the injunction that cannot be solved with money? And if you think about it, there's actually very few things that can't be solved with money. And so what ends up happening is is, as you said, is that it's it's a private agreement. The court, it's it's typically unlikely. I mean, sometimes you might get fines. Copyright does have statutory damages. So in the case of Napster, the record companies did go after individual people. There's a woman, Jamie Thomas, lady in Minnesota that had, you know, I think she had 80 songs or something. And at the time the statutory damages were like, I think they were 15 k or something. They go up in inflation. But anyways, she ended up having to pay $400,000. She's just like regular lady. And so, like, that was the record industry saying, no, no, we're gonna get our specific damages per work. And so you you could get something like that. I think it's unlikely. Also, OpenAI has said they will indemnify. So unlike Napster, which in that case didn't indemnify this woman, OpenAI has said they would. And so, you know, I think this is gets it more in the kind of clash of titans scenario that you that you mentioned at the beginning.

Nathan Labenz (49:03) Yeah. Okay. So I guess another question is, what are the damages? I mean, it seems very unlikely. This is something my wife who is a lawyer has trained me to think about. Anytime somebody's like, We're going to sue them. She's always like, Well, are there really any damages? And a lot of times there's not. It's very hard for me to imagine that they're going to be able to produce a ton of people who are like, Yeah, I canceled my New York Times subscription because I now have ChatGPT. They're pretty different products and I really don't think many people are doing this regurgitation thing. So it seems like it's more of a principled thing, right? If I'm reading the 2 sides and I'm trying to infer what they really care about, It seems to me that OpenAI probably wants to set a precedent that what they're doing is allowed and fine and they want to put this to bed once and for all, and this is a good chance to do that. And then on the New York Times side, it's a little bit harder for me to say, but it almost seems like they want Congress to act. It doesn't seem like they really are going to win in the courts. They might need Congress to say, Hey, we need a new regime here entirely because this training thing is It is transformative, but we can't necessarily have everybody's stuff sucked up and transformed with no compensation.

Cecilia Ziniti (50:22) Yeah. I mean, a congressional solution is very possible and this kind of strategic you know, litigating with that in mind is is very I would say New York Times is is likely doing that. New York Times has also lost at the Supreme Court before, on copyright. And so, you know, they they have that history as well. But in terms of, like, the the the damages and is it the principle of it? You know, sure. Like, there's definitely a principle involved here, but that's actually something that the law is pretty good pretty good at handling. Right? It's literally saying, like, weigh the public good. Like, what is this? Will New York Times or future journalists, future creators continue to create works if they know that they're not going to be compensated and that they'll train LLMs? You know, that's the kind of thing that I expect the court would consider certainly at the highest levels. In terms of the damages, I mean, do if I'm a lawyer for New York Times, you know, they got good lawyers. They can find people. You know, I'm seeing Andrew Chen, who I love, Andreessen Horowitz, investor. He posted a long and viral ish tweet thread about how he uses GPT only for news. And so, you know, that's 1 example, but it's the kind of thing that it doesn't take a huge mental leap to think like, I have ChatGPT open as a window all the time, right? So this is a tech shift. Right. And so like when these behaviors change, you know, and you play musical chairs or whatever, who's holding the bag or mixing metaphors. But anyways, I to be there at times to to fix my metaphors. But anyways, you know, you can see why they're doing it. It does come down to money too. Of course, my favorite quote about copyright was law school. We had the attorney for it was like a big rapper. Don't remember if it's like PDD or Beyonce, somebody as a guest speaker. And they said, you know, the question was, how does copyright come into play in your work? And he just like totally deadpanned is like, well, copyright is not about the money. And all of us are like surprised. He's like, it's about all of the money. And we're like, okay. Like, so that was a very kind of, I guess to your wife's point, like a very sort of cynical way to look at the law, but helpful. And so, you know, in a scenario where New York Times can get an ongoing royalty forever, of course, they're gonna try to go for that.

Nathan Labenz (52:41) But just to be clear, that wouldn't be possible today aside from direct agreement by the parties. A judge cannot say you have to pay in perpetuity.

Cecilia Ziniti (52:52) It's not as clear here. There are scenarios where, like in a Napster scenario or with watermarks, you could show a specific number of copies were made or in the context of patent lawsuits, there are sometimes like per unit damages that are assessed. And that's the normal framework. Here, I agree with you, it'd be technically quite hard. Although to the point of discovery, you know, you can come up with math to support it. Right? So New York Times had a graphic saying, okay, we're x percentage of common crawl. Common crawl is x percentage of your training data. Divide your valuation of a $100,000,000,000 by that percentage, and that's what you pay. So you can come up with something, but it's to not be very clean to your point.

Nathan Labenz (53:35) Very interesting. How do we think about the different modalities? Because on the 1 hand we have text and it seems like if I'm gracking this, it's like, okay, if you directly memorize and regurgitate an article, you're probably in some trouble. They kind of recognize that to an extent where they're trying to fix it. Then on the other hand, we've seen a lot of Mario and Luigi images generated from Dolly 3, and those are not exact reproductions. But even though they're kind of different and they might even be quite different, then it still seems like those are kind of drawn in the circle of things that are not allowed or where you would have to pay royalties on them or something, right? So how should we think about different modalities from text to image and then maybe music? I don't know how many different modalities we can really consider, but it seems like it varies.

Cecilia Ziniti (54:38) Yeah. So you've got kind of an overlay of you can get to exponential complexity pretty quick because you've got, you know, the 4 types of IP that we started with at the top of the hour. And then you've got, you know, kind of X different modalities with video, images, text, music, etcetera, brand. And, you know, the number of legal permutations and then technical permutations, you know, I do expect that they're probably either congress or courts will have to come out with something more clear, or we're gonna have like the canonical case for each type. You know, in the case of Mario, you know, Sony for its video games and Nintendo as well, They're incredibly active rights holders and you can see why. You know, 1000000000 people have played Mario and, you know, their creator. I read an article about him. Like, he literally is like, he's in kids' dreams. Like, this is like a thing that creating this world. But then you think about when 1 of the most savvy acquisitions, Disney's acquisition of the Marvel Universe, was it's all IP. And the ability to exploit that again is something that they paid for and that Disney is an incredibly successful enduring corporation because of it. And so in terms of the different modalities, I do think New York Times case is cleaner being just text, but we're going to see, you know, the visuals are incredibly compelling. I made a bunch of Mario's myself. But if you think about the idea expression dichotomy, even Italian plumber, you can think of lots of permutations that aren't Mario. I'm like, draw an Italian plumber. Maybe he's super stylish and he's like that dye workwear guy on Twitter. Like, you know, he's got his like perfect suits. Like that's literally, that would be more Italian. I'm Italian, like, and I don't dress like Mario or whatever. So you can imagine that the ones where it's so clearly, it's like a feeling. And the fair use cases on copyright that get at graphics, they really are about that sort of like, is it so clear that the impression that this alleged infringer gives is so clearly the original that there's no way they could have come up with it on their own, or it's just like not a thing. That's the test that's used. It's a difficult test, but it's been applied. There was a restaurant that was inspired by oh, SpongeBob. Yeah. So it was like a SpongeBob inspired restaurant, but it was like all the same stuff. Like the crab looked the same. Like, I think they called them it was like the rusty crab instead of the crusty crab, something like that. Anyways, in situations where it's so clear, like, think even in an AI context, you're going to end up with licensing. What I would expect is some kind of brand registry. So now if you ask OpenAI to, if you ask Ali to make you Mario, you get an error. And if you ask for Italian plumber, they probably created a bunch of keywords, but I would expect they'll have some sort of self-service UI for publishers and rights creators to kind of claim things. But again, how scalable is that? How clear it is? DMCA is very clear. That was a congressional solution. So you could end up with that here too.

Nathan Labenz (57:56) I haven't studied in detail how OpenAI's opt out works. You may know more about it, but my understanding of the current state of play is that OpenAI has created an opt out even just for the tech side. And they have done a deal with Shutterstock on the image side and perhaps other deals as well that are not disclosed. Interesting that they're going proactively for the licensing on the image side. And on the tech side, they're basically just saying, Hey, you can opt out if you want to. Not entirely clear even on a technical level how that would work if you've just You trained GPT-four 18 months ago and now it's kind of baked, right? And are you really going to be able to extract? You're not going to retrain. So what exactly does that opt out mean? It would be easier to understand if it was just at the level of the browsing features. I don't know if you have any more clarity on that.

Cecilia Ziniti (58:48) Yeah. So OpenAI points to in August, they rolled out a specific respect for robots. Txt, robots. Txt. So you can just give the instruction. I think it's like OpenAI colon do not crawl or something or we can put it in the show notes. But that would apply also to if you have images on your site, you can do it that way. But in terms of models that are already trained, so GPT-3.5, etc, you know, you can't you know, it's like a cake, right? You've baked it with flour, you're only going to extract the flour, the vanilla in a cake. Okay.

Nathan Labenz (59:21) It's been transformed at this point.

Cecilia Ziniti (59:23) Exactly. It's a cake, right? So, you know, especially in the case of these small ingredients, like, you know, you might cook something with nutmeg and have like little shakes in there, but you can tell. And so that would be New York Times' argument is that like, okay, 2 little shakes, but we made the model this much more fluent or whatever. But technically though, I'm with some of the commentators on Twitter have said, you know, if they're going to figure out AGI, they can figure out and not have infringement. And there's people that have said that they're using GPT-4.5 to actually do the copyright analysis. So if you ask for some of these queries where you're like, give me an Italian plumber or saving a princess, my experience, own testing, I get like the start of a result and then it'll catch itself. It'll say, you know, it'll render and it'll say, you know, error or copyright policy. And so some people on Twitter have opined that maybe they've got the more advanced models checking for infringement on the prior models.

Nathan Labenz (1:00:19) That is definitely an interesting strategy that I do believe they are pursuing in various ways. Both forward they're trying to do this weak to strong generalization as well. So they've got a lot of model. Well, this is just out of their super alignment team. I shouldn't say just because I think it's a big deal. I haven't figured out quite how I understand it yet or how much confidence I have in it. But basically, general premise is the systems keep getting more powerful. At some point, they're going to be more powerful than us. We still want to teach them what we want them to do, but we need them to be able to kind of generalize beyond our ability to supervise directly. So they then create a toy version of that problem where GPT-two, I think, becomes the weak teacher and GPT-four or whatever is the strong student. And the question is, GPT-two gives you sort of a because it's not that good, right? But it kind of knows what it wants on They have different questions or whatever. But if it can give you this sort of imperfect but directional signal of what to do, how does GPT-four learn from that? And does it perform better perhaps? Does the strong student perform better than the weak teacher? And they show at least some weak to strong generalization. I'm not a big believer that this is about to solve our problems, especially given some of the hacks that I kind of understand that are included there. Like 1 of them is the So you would ask, Well, how much better can the strong student do relative to the weak teacher? And it seems like they can turn up the delta, right? The strong student improvement relative to the weak teacher, can increase, but they do that by increasing the confidence of the strong student. So it's like basically they make it more willing to override the weak teacher and therefore it can do better. But I'm kind of like, Well, wait a second. If the solution is going to be to tell these super strong AIs that they should just be more confident in overriding their weak human teachers. That doesn't sound like a scheme that I'm ready to bet the fate of the world on.

Cecilia Ziniti (1:02:34) That's fascinating. So interesting, the teacher metaphor though. That's what I'm taking out of what you said is as a matter of pedagogy, that's been 1 of the most interesting things. And I teach this in my I teach a prompting class for lawyers. And in that class, I make the point that like these models are basically fast, smart, overeager interns who have read the entire And so like in that scenario, some of these methods like, you know, I think there was a guy that found out that you could tell OpenAI or GPT that you would tip it and it would do better. Somebody even tried it with a dog treat that you can say, you know, get this answer right, you'll get a dog treat. And so it's interesting that like whatever the behavior, if you have like, I guess Anthropic does a constitutional AI thing or in the super alignment, like whatever it is that you don't want, infringement, whatever, like how do you actually teach it and encourage the model to do that? It's a fascinating question. I'll have to read more on this like strong student or weak teacher. I'm like, oh shit, am I the weak teacher? And my students are great.

Nathan Labenz (1:03:38) The premise is that humanity collectively is the weak teacher, eventually anyway. And depending on how much credence you want to put into some recent Sam Altman comments, perhaps not even that far into the future.

Cecilia Ziniti (1:03:53) Yeah, that was a great conversation, 1 with Bill Gates. It was super interesting. I thought it was funny that he said his most used software is Slack. So I was like, Okay, he's human like the rest of us.

Nathan Labenz (1:04:02) Okay. So coming back to the intellectual property stuff for a minute. And then I do want to spend If you have a few extra, I'd love to hear a little bit more about your company as well. The AI doctor of our global humanitarian dreams is too good of a promise to lose out on because we want to protect the Marvel Cinematic Universe and Disney's rights to exploit it. Right? So there's something here where the collective good has to trump at least some IP considerations. I feel confident in stating that. Does the law have anything like that today?

Cecilia Ziniti (1:04:42) Yeah. So I mean, I think the software and biotech having different interests as an example has been the case for some time. And, you know, the patent system, you know, you mentioned a lot of people say it's broken and then it's like, okay, do you separate out pharmaceuticals from software? So there are examples of that in the context of fair use, like, you know, to the credit of the law since whatever 1851 when fair use was first a judicial doctrine up right around New York Times when New York Times got started, but anyway it's back to the 1800s. It causes this kind of public good balancing test. And it's not a surprise, right, that in OpenAI's blog post in response, they said, you know, not only are we for use, but we're good for the world. And the link that they cited, you'll appreciate this because it's a legal thing, they cited somebody's dog, saving the life of their dog, using ChatGPT. Now 2 things interesting about that. 1 is that, you know, obviously, it brings to mind the public good. But 2 is they didn't cite to a human that saved themselves because they knew there was like liability and they say we're not a replacement for a doctor. So they picked a dog example. Similarly to you'll see Fitbit famously, can tell if you're pregnant, right? So when you're pregnant, you have more, blood in your system and it shows up in your heartbeat and your other stats that Fitbit can say. And when they talk about that, because of the law that says Fitbit is not a medical device, it's a personal fitness device, blog posts about that are like, and then I talked to my doctor and they confirmed. And so they're always sort of like citing the doctor. So the law will find a way. I do think, like I said, the cat's out of the bag, do need to have a bottle? This technology is what's gotten me so excited as a builder is like, I can't stop thinking of use cases to use this for. And that was not the case for me with blockchain or with Web 3 point And like, it's interesting and I can see the rationale for it, but it doesn't have this like, Oh my gosh, we could apply it to this, we could apply it to this, we could apply it to this. I was talking with a friend who's a plaintiff's attorney and, they mentioned that when they're deciding whether to basically to go to the mat and sue a company, some of these soft factors like that would come out of like Crunchbase of like, does the company have money? How do they act to lawsuits? You know, that kind of analysis is deep and takes a lot of time. And you can imagine, you know, a transformer trained on all of the law and all of, you know, current business could come up with that result pretty quick. So like things like that where my lawyer friends in particular have even gotten excited, which we're kind of like engineers. We're not super like excitable and we're sort of naturally skeptical. And so GPT has reversed that. So I think the possibilities are super strong. Like I said, I'm not quite Marc Andreessen level, but I'm quite a tech optimist on this.

Nathan Labenz (1:07:40) A couple other questions that come to mind there. 1, we didn't even talk about it really at all, but if I were to say, what has surprised me about how 2023 went? The fact that there hasn't been more pushback from the licensed professional classes is definitely pretty high up on my list of surprises. I would have guessed we would have seen like an AMA versus OpenAI sooner, right? You're giving out medical advice without a license. Yeah, you caveat or whatever, but really everybody knows that everybody's asking their medical questions here. As far as I know, not much of that has happened really at all. We've had a couple of the tragicomic stories of the lawyer that had ChatGPT draft the thing and filed it and got in trouble that way. But we haven't seen this major clash of interests. Do you have any theory of why that is? It could be like maybe they don't have as much standing as I think or but I'm surprised that there hasn't been more conflict there.

Cecilia Ziniti (1:08:40) It surprises me a little bit. Certainly, do see some of it. So the California bar came out with guidance about using ChatGPT and then of course the story made big news. And but you've got, you know, even John Roberts at the Supreme Court said a few weeks ago in his report on the state of the judiciary, you know, that this will change legal practice and that, you know, that legal research will soon be unimaginable without AI was his quote. And you know why that is? I think a couple things. So 1, it's super useful. You know, literally like I can draft, you know, using my product, which is, sits on top of GPT-four. I can draft a demand letter that would have taken, you know, 10, 15 minutes or even an hour in like a minute. It's pretty good. Like it's literally like the jaw dropping moments I have in my classes from lawyers. So they're sort of like, it's good. Like that's 1 thing. The second thing is we kind of remember the internet because we're in it. Like most people who are practicing lawyers now, you know, you're going to be at least 25 to have gotten through law school. And you remember kind of the internet coming up and Google coming up and you're like a digital native, I guess. And so based on that, I think the sort of future of the profession gets it. And then 3, I think this like unbundling of expertise has been happening for a bunch of years. You know, Ben Thompson had a good article about it, know, Instratechery about how, you know, he gets all these eyeballs, but he's just a guy on the internet. And that really is the democracy of tech and that's what tech would point to. Now the query though of like, you know, what will it mean to be a lawyer? You know, we are worried about that. Like, I definitely, you know, there are some folks who are like, kind of like, Oh my gosh, where's my job going? And you see that and you definitely hear about it. But then we see the case of the made up cases or occasionally you get maybe GPT-4.5 or future will be better, but you do occasionally get garbage in response to legal queries like fairly often. And so, I'm really bullish on lawyer in the loop or professionals in the loop. But doctors, I don't know. Doctors, the other thing, I mean, like you still need to be seen. So I think like Doctor. Google is like not great. Maybe Doctor. ChatGPT is better, but we'll see.

Nathan Labenz (1:11:10) Yeah, it's happening quick. There's another recent paper out of Google DeepMind where they have a chatbot competing against humans and also against humans plus AI for diagnosis purposes. And it's 1 of the great varied leads. I'm a collector of great varied leads in the AI space. This is 1 where it's like, By the way, the AI is outperforming the human 2 to 1 and it's also outperforming, in terms of accuracy, it's like 60 to 30.

Cecilia Ziniti (1:11:38) Was it radiology or what was the question?

Nathan Labenz (1:11:41) They've been on a tear with a series of different things. This 1 in particular was a chat modality only, I think. I need to double check that. But it was by far the biggest delta that I've seen between human and AI in the medical field. Previous results have been more like They're comparable or The AI is narrowly edging the humans out, but we shouldn't over interpret this. And this 1 is getting to the point where it's going to be pretty hard to come up with all the caveats to be like, We really don't have to take this at face value because the ratio is a full 2 to 1 in terms of just getting to the right diagnosis. So certainly there are things that What I always tell people is, If I have a serious health issue, I will want to use both a human and an AI doctor. I'm not going to leave either 1 on the table at this point, but I don't have access to that system because again, they haven't put it out there, but it sounds very, very good. And at 2 to 1, it's going to be hard to defend the licensing regime.

Cecilia Ziniti (1:12:46) No, I mean, that's right. Mean, if you have access to of the bar associations actually has this California bar has a duty of efficiency to your client where if you have access, if you have the possibility of using a tool to zealously represent your client, you're obligated to do it. So it's an interesting angle, you know, including for a doctor. You know, if your ultimate goal is patient health, then yeah, you would want to use these tools yourself. But I think you know, somebody's gotta train it too. Right? And, you know, in this case, it's trained on the whole Internet to to my point where some of these common issues, like, I've been part of continuously part of an online mom's group since my first kid was born 18 years ago, and I've written probably volumes of things about various kid ailments. And so, yeah, probably, you know, like these ones that are funny, like toxic synovitis. I'm like, what the hell is that? Sounds super duper scary. Okay. Randomly, if your kid ever like their leg like freezes up for like a day, it's that. You can ask your doctor and they will calm you down, but you're like, oh my god, that gets no. But you discover that. I, at least, I discovered it through these online forums and then like, you know, I called the doctor and that was the case. So yeah, I I think it'll be I mean, again, I get back to the optimism and I share both your curiosity and your excitement about it.

Nathan Labenz (1:14:08) 2 more big picture questions and then the company. So open source, we haven't really talked about throughout this conversation. It certainly changes the practical reality. If all of a sudden GPT-four were ordered destroyed, we would still have Mixtrol and it would be out there and it's just kind of out there, right? There's a company behind that, you could see them, but we also see these increasingly collective forms or whatever. There's certainly a way to organize and train a large model such that there's really not going to be a good target to sue. And also once the weights are out there, they're out there. So on the practical ability to enforce or ability to go sue somebody or to extract rents, open source does seem to change the game. How would you think about that from a legal standpoint though? Is it relevant in the New York Times for OpenAI that they would be like, By the way, this is all out there for everybody and free and with nobody that you can sue. Or does that just kind of have to sit to the side for the legal purposes?

Cecilia Ziniti (1:15:14) Well, couple of things. So first of all, there's kind of always somebody that you can sue. Like, yeah, if it's open source, like

Nathan Labenz (1:15:20) Spoken like a true lawyer.

Cecilia Ziniti (1:15:21) Exactly. There's always somebody you can sue. Like, you know, in the case of like, know, whoever provides the resources or you could sue somebody for hosting it or, you know, there's like things to do. But, you know, how does it change the picture? I think open source is just 1 more way you can exploit your rights, right? Open source is not, you know, what is it? Free as in freedom, not free beer or whatever. Like, it doesn't mean there's no money changing hands, right? And so all the great open source companies that we've seen come up and just models like, you know, I follow JJ on Twitter and, you know, he makes a great case that open source is actually both the way to advance software and to make a lot of money. So like, I don't know that it necessarily changes the New York Times picture. I do think there will be something around, you know, crawling and trespass and providing some kind of opt out. You know, otherwise, as a downstream user of the open source, you can see the downstream users and yeah, okay, you got it off the internet, but you knew it sucked. It's kind of then it goes to the Napster scenario where, you know, you have the individual woman who was sued. It's like, oh, it's just open source. I have no idea. Like, it would be hard for a legitimate company to make that argument. And you see open source compliance being a big thing, right? You know, there's startups now that are checking how you do your open source dependencies using AI. And so, you know, whatever the problem is, you can apply AI to that problem and then you can find somebody to sue. So that would be my take there.

Nathan Labenz (1:16:46) Okay. So last big picture 1, you said earlier that it's kind of hard to predict how these cases will play out. So I won't if you can do that if you want to, but I don't necessarily ask you to predict how the courts will rule. But I guess I'm really curious as to how would you rule maybe under current law? And then maybe even more interestingly, what do you think the law should be? We have this kind of congressional paralysis that unfortunately leaves us, I think, in a spot all too often where we're like, well, we couldn't hope for any new rules to really be appropriate for this. So we just have to kind of force everything through the old paradigm. But I would love to hear some forward looking thoughts on like, if we're going to write new rules, what do we think they might ought to look like?

Cecilia Ziniti (1:17:36) What should the rules look like? The scenario where my friend, the influencer, has got people creating GPTs exclusively of their work, it feels like there should be a solution for that. And my prediction is actually that there ends up being a commercial solution to that where there's some startup that's like, hey, GPT yourself. And or it becomes so easy that this creator could input her website. And then it's literally like, you can get her knowledge from there. So like some tech solution where the original, like, if you're literally wanting that person's insight and voice, there's some compensation on that. So I predict some something in that nature. In terms of, you know, another interesting thing people have started watermarking. There's actually a company called Nightshade where you can actually poison your images, where it will actually infect the training from there forward. That feels kind of illegal to me. So I don't think that'll be the solution. But something where there is some sophisticated watermarking, there's a bunch of startups looking at that. I would expect that OpenAI or Microsoft or 1 of the players will buy 1 and then it will become really the legit marketplace. And then the third thing I would expect is just a lot of deal making, right? So Kindle, which is digital technology that affected a legacy industry quite a lot, there was just a lot of deal making directly from Bezos himself with the publishers on that. So I would expect, you know, a lot of deal making. In terms of how to rule, you know, I definitely would do, you know, kind of what you laid out, Nathan, where it was like the 3 or 4 different claims and really distinguish in the New York Times case, like, regurgitation. I expect the court would rule something like you must take reasonable measure measures to prevent it. And, you know, more of like, almost like a tort type of type of tort is a, when there's a wrong in the law that's called tort and like a tort approach to that, would expect. And then, on the training issue, you know, it's tough. It's definitely it's gonna be real transformative, but I think there will be some individual, rights after. Or maybe you register and you say, hey, if you do a training on me, have the opportunity to test it. Or you can get a report. You can imagine OpenAI saying, here's the number of times that people ask for something related to New York Times. So I imagine it will be a sophisticated suite. By the way, Google has had 20 years of being able to think about how to deal with IP claims. You can find there, they have a transparency report where they've got, think it was 7,000,000,000 They did 7,000,000,000 takedowns last year or something like that, some crazy number. So I expect a system to emerge.

Nathan Labenz (1:20:27) Yeah. There's this anthropic research about influence functions, tracing model outputs to the training data. I don't know that it's scalable on the level that it would need to be to create a sort of Spotify like rev share, but it's the closest thing I've seen to I, and yeah, I think it's probably flawed in many ways as well. It's it's definitely not as clean as Spotify. You go on Spotify, you play the song, you know whose song you're playing, you can kinda divide up the the revenue in a pretty clear cut way. With these influence functions, the purpose of the research was to show that with really small models, you have dumb stochastic parrot like behavior And that keyword matching is kind of a rough heuristic for what training data documents seem to most influence the generated output for whatever query. But then as they go up to the bigger models, you start to see that the documents that are most responsible are like it's a much more sophisticated relationship, right, that they have not just simple keyword matching, but genuine semantic relationship to what is happening. And so this is an interesting way of beginning to quantify. They've got some really cool visualizations as well that just show that when you have a small model, it's working in a certain regime, bigger models are working in a different regime. Would it be possible or reasonable to expect that they could calculate that for every single ChatGPT query? Probably not. I don't know how That seems like it might be overwhelming, but there is at least some principled approach there where you could say, this output connects to this somehow.

Cecilia Ziniti (1:22:13) Or maybe it's a threshold program, right? You can imagine TikTok and all these others, you have to have a certain number of followers. On Twitter, you need a certain number of followers before you qualify for monetization. You can imagine things like that. I mean, I don't know. I just laugh whenever like I guess I share. I don't remember if you were 1 of the people that tweet about this, but when the EU AI Act came out and had a specific size of model that was regulated, I just always think of like, was it Bill Gates or somebody famously was like, I don't see why you would ever need more than 32 kilobytes of storage. And it's like, well, okay. Like, I can see that now. You know? So, you know, I would challenge tech and challenge the companies to to to think about how to do it. Now if that's distracting from, amazing medical breakthrough, okay. But wherever there's money to be had, which there is here, then there should be innovation. And so I would expect that if it's not 1 of the big players directly, that 1 of these smaller players will figure it out.

Nathan Labenz (1:23:05) Yeah. It's funny. It's amazing how obviously, this is such a huge emerging force in society, the rise of generative AI broadly. But it kind of breaks some of our paradigms too. You think about how little money artists actually get from Spotify And then the effective rate per play is just so low. And that feels like this is coming here too. OpenAI is doing whatever, some ungodly amount of tokens and some ungodly amount of generations, but they're still only doing like 1000000000 dollars. So you kind of imagine, okay, what's a rev share on that? And now how am I going to spread it out and whatever? And next thing you know, The New York Times is probably not even getting very much now or even 10 or 100x revenue growth from here. And on top of that, they're trying to push their prices down as low as they can and don't even seem to be There's a weird non economic mode of operation where they're not trying to make all the money they could make and trying to keep their prices as low as they could. Some might argue if you were a data supplier to them living off of rev share, maybe unreasonably low. It's just a very weird thing that is hard to figure out.

Cecilia Ziniti (1:24:17) And then throw in the IRL component, right? Taylor Swift making 1000000000 dollars on her tour this summer, which was amazing. And it's like, okay, how does the money shift? Where does it go? And is Taylor the only artist powerful enough to she wrote a letter to iTunes and brought them to their knees and had them switch their policy on basically artists not getting paid during the trial period. So I mean, can definitely like how this will shake out and the amounts of money that we're the billion dollars now is going to seem, to your point, like chump change.

Nathan Labenz (1:24:49) Well, there's a lot more, many more shoes to drop. So maybe we can get back together in the future with another case or with a new precedent or maybe even some new legislation. For now, let's talk a little bit about your company. So this is Usually I have to try a product before I'll do it on the show, but you've been so generous in educating me.

Cecilia Ziniti (1:25:08) It's for lawyers. So actually, so I'm not selling to you yet. You are not my ICP to use the startup terminology. But yeah. So basically it's, gcaigetgc.ai and it's in private beta now because I want it to be good. But essentially it sits on top of GPT. It's a chat modality. But some of the warnings like GPT now gives warnings as you said, where it's like, I'm not a lawyer, consult a lawyer. Okay. I am a lawyer. You don't have to give me that warning. So I've developed a bunch of prompts that make it speak more like a GC, right? So GC is bottom line up front. Hopefully I broke down the issues clearly, but some of the GPT, there's the viral tweet from the Wikimedia guy that was like, GPT is a golden retriever. Golden retrievers are not good lawyers. And so basically I've adjusted it to be good. So that's 1. 2 is, you know, really thinking about really developing the vertical AI where it's part of my workflow, right? Sam Altman mentioned that his most used app is Slack. Well, so is mine. And that's where all the legal questions come in. Like, how do I get a copyright? Or how do I whatever? And so integrating it there in the workflow, integrating it within Google Docs. And then for me, you know, it's really about in house. And at tech companies, you know, typically your lawyer is your in house lawyer. Sometimes you interact with outside lawyers if you're in litigation and certainly OpenAI is going be in that boat now. But that's where I came up. And I mentioned getting so excited about AI. That's where I'm applying.

Nathan Labenz (1:26:36) So it's get gc.ai and maybe not in the exact target market, but I'm on the wait list.

Cecilia Ziniti (1:26:43) Well, your wife. You said your wife's a lawyer. So maybe we'll we'll we'll we'll try her. But, yeah, we're in the kind of feedback phase, you know, rooting out bugs and, you know, hope to launch in the next, month or 2.

Nathan Labenz (1:26:54) Well, keep us updated on that. This has been a fantastic conversation. I've really learned a lot from it and I appreciate you taking the time to share all your knowledge with me and with our Cognitive Revolution audience. ChatGilla Ziniti, thank you for being part of the Cognitive Revolution.

Cecilia Ziniti (1:27:11) Thank you, Nathan. This was super fun. Reach out anytime.

Nathan Labenz (1:27:15) It is both energizing and enlightening to hear why people listen and learn what they value about the show. So please don't hesitate to reach out via email at tcr@turpentine.co, or you can DM me on the social media platform of your choice.

Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to The Cognitive Revolution.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.