From the course: The State of AI and Copyright

Does the doctrine of fair use apply when training AI models?

From the course: The State of AI and Copyright

Does the doctrine of fair use apply when training AI models?

- We know that AI models are often trained on existing content and intellectual property, and I think that's kind of the crux of this conversation is in order to learn and create, these AI models that we're talking about have to take existing information and that raises a ton of legal questions about things like ownership. And I guess my question is, is it accurate at this point to say that the answers to a lot of these questions are just up in the air? There hasn't been enough cases, or we just haven't had enough experience working in this field yet. Is that accurate to say? - It's somewhat accurate. I would say that there are a couple of open questions. I think the law's fairly settled in the sense that we know what the law is, we're just not sure how it's going to turn out with these new facts that we're seeing with generative AI. It's super fascinating. I think the key issue, at least in the US is on fair-use and when it comes to, you know, Garrick as you said, as these machine learning models are ingesting lots of copyrighted information along with all sorts of other information, personal data, sensitive information, is it a fair-use of that material to copy that material which would be one of the exclusive rights of copyright holders to make copies of their work and use those copies to build and train one of these generative AI models that may then output some material that looks kind of similar to that original source material. The things I've heard is, you know is this theft on a mass scale of copyrighted works on the content owner side and then on the gen AI platform side, you know, is this a transformative, novel use of the works and something we need to do for technological progress? So very strong views on both sides of this issue. - Yeah, and has anything been determined at this point? Are there any cases we can actually look at that get us closer to answering these questions? - No, but we've seen some arguments so far. I think the Northern District of California, at least in the US, the District Court there, there's several pending litigations. A few that are class actions addressing this very issue. And I suspect if one of these cases advances to a sufficient stage in those proceedings that we're going to get some kind of ruling. I don't know which judge it'll be, but I suspected it'll be in front of a court. - Matt, did you have something to add there? - Well, I think in the UK it's a little bit different. So in America you have an open-ended, fair-use exception, and that historically has allowed some really quite radical use of technology to be allowed by the court making policy on the fly as it were, based on its own precedents, but not based necessarily on a fixed rule that the legislature has made. The UK is very different. We have what's called a closed list. And so we have a whole series of examples of what is fair dealing for the purposes of the UK. And so in order to either train a model or to have a model if it contains copies of examples it was trained on or indeed to output works based on those training examples you need to fit into one of those exceptions. And the real action is about whether it falls under temporary copies because you've processed them in such a way and then you've deleted them or if it's a text and data analysis exception which unfortunately in the UK is only for a non-commercial research, or some other exception. So I think you have to thread the needle a lot more carefully in the UK whereas in the US, I would suspect one judge could well think that fair-use applies in the same way it did to looking at images and producing them in web searches, that it's a transformation approach that creates, enables a new technology. But obviously Jen is the US (indistinct) so I certainly defer to her views.

Contents