JDP foto
By Ewa Mońdziel, senior associate and Amelia Prawda, associate, JDP Law

 

 

In 2023, generative AI (GenAI) is one of the highlights of modern legal challenges. What we want to address is that GenAI has, at the root of its effectiveness, the training of an algorithm responsible for generating outputs on a specific dataset selected by the developer. For this reason, it requires large amounts of training data, and some of this data may be protected by third-party rights.

The main private-law questions surrounding GenAI touch on copyright issues: Are developers and users of GenAI solutions copyright infringers? Can using works protected by copyright for training GenAI models lead to copyright infringement? What does it mean for authors of copyrightable works? Does copyright protect AI-assisted output?

Do AI regulations address copyright issues?

EU laws can answer some of the legal challenges AI brings. One of them is the Directive on copyright and related rights in the Digital Single Market (the ‘Directive’), which touches upon text and data mining (TDM), a technique automating the analysis and extraction of information from large datasets. TDM techniques are being used to train GenAI models. As copyright law may complicate TDM use, the Directive introduced two exceptions to enable TDM in Articles 3 and 4, respectively: use for scientific research and, unless the rightsholder objects, for other purposes, e.g. commercial.

The rise of GenAI made EU lawmakers review the still processed proposal for a Regulation laying down harmonised rules on artificial intelligence (the ‘AI Act’), being the first extensive framework for the use of AI. The outcome of the EU Parliament’s work on the AI Act is that GenAI would have to comply with transparency requirements, one of them being to document and publish sufficiently detailed summaries of copyrighted training data. This requirement could be helpful for copyright holders in the case of unlawful use of their work. However, it remains unclear how exactly such a summary should look and whether it is technically feasible for AI developers. An agreement on the final wording of the AI Act is to be reached by the end of this year.

What copyright issues do developers of GenAI face?

Recent litigation over GenAI models often involves copyright infringement by using copyrighted data in training without the authors’ permission or in violation of licensing terms.

Authors Mona Awad and Paul Tremblay have filed a lawsuit against OpenAI, alleging copyright infringement. They base their claim on the fact that ChatGPT generated accurate summaries of their books, which, according to these authors, demonstrates that OpenAI might have used books from illegal shadow libraries to train its GenAI model. However, proving this may be challenging. ChatGPT could have drawn from various online sources, not just the authors’ books or derivative works. Similarly, in AI-generated art, Getty Images is suing Stability AI, the creator of Stable Diffusion, for copyright infringement, alleging that it copied millions of images to train its AI model without permission or compensation.

In turn, in the software field, a group of programmers sued Microsoft, GitHub and OpenAI for violating the open-source licensing by not crediting the authors of the source code that Copilot, an open-source software repository, has been trained on. According to the plaintiffs, the “Defendants stripped Plaintiffs’ (…) attribution, copyright notice, and license terms from their code in violation of the Licenses”. They claim that Copilot’s goal is to replace open source by copying it and keeping it behind a paywall, thus monetising the code.

Non-profit media organisation NPR reports that the New York Times is considering suing OpenAI “to protect the intellectual property rights associated with its reporting”. Apparently, the two parties have been negotiating a licensing agreement granting the Times royalties for incorporating its data in AI solutions, but to no avail.

Although the described lawsuits take place mainly in the US, their outcome might impact copyright laws not only in the US but also in other parts of the world in the context of the concerns surrounding GenAI.

Is it ‘fair’ to use someone else’s copyrighted work to train GenAI?

The question in many legal battles is of fair use. Should it be considered ‘fair’ to use copyrighted data to train a model? What happens when the model is used for commercial purposes?

The Authors Guild v. Google, Inc. case (the Google Books case) might be relevant in emerging lawsuits in the USA. The court ruled that Google was protected by the fair use doctrine because its use of the copyrighted works was transformative, making information about the books available without substituting them.

What the European courts will decide under the Directive is yet to be seen when similar claims arise. While Article 4 provides the tool to defend against such claims, it also has been criticised as leaving insufficient leeway for TDM users. In Poland, the provisions implementing this article still need to be implemented.

Does copyright protect AI responses?

Another topic surrounding GenAI is whether the AI’s response to the user’s prompt can be considered copyrightable. In the Tremblay v. OpenAI case, the plaintiffs argue that every output from ChatGPT is an infringing derivative work because it relies on expressive information from their works, violating their copyright.

The EU copyright law protects the author of a work, being a human and not an AI-driven machine. Would, therefore, AI-generated works be subject to copyright? The simple answer might be: no, because they are not a human creation. But shouldn’t the answer depend on how far the prompt affects the outcome? It seems possible that GenAI, or at least some of its models, could be seen as a creative tool fulfilling the artistic intent of its user, thus bringing the outcome closer to a copyrighted work.

What to expect at the intersection of copyright and AI?

It seems that the way in which AI solutions are used will be of unwavering interest, above all from those whose work can be exploited this way. Now, creators focus on the use of works by AI developers. However, an aspect that its users should be mindful of is the risk of using, e.g., whole passages from copyrighted works in an AI-generated output, which the user may not even be aware of. Further use of such text may have legal consequences for the user.

Interestingly, one of the biggest AI developers, Microsoft, is willing to assume ”responsibility for the potential legal risks involved” if users are challenged on copyright grounds when using Copilot’s output, as per Microsoft’s announcement of its new ‘Copilot Copyright Commitment’ on 7 September 2023. At the time of submitting this text for publication, it seems to be unclear how Microsoft’s guardrails and filters work. Whether other AI developers will follow such an assumption of legal responsibility is yet to be observed.