Szymon Sieniewicz K1_press
Szymon Sieniewicz, Head of TMT/IP practice, Linklaters Warsaw

 

 

Artificial Intelligence (AI) is by no means a new phenomenon. In the last year, however, generative AI has made headlines as the launch of platforms such as ChatGPT, Dall-E2 and Stable Diffusion has put this technology into the hands of the public for the first time. These platforms are now commonly used by millions of people to produce text, art, music, films, code and much more. Generative AI raises a plethora of ethical and legal risks, not least the potential for AI tools to misuse personal data, create biased outputs and spread misinformation. In this article, however, we’ll focus on generative AI and intellectual property (IP) rights.

AI inputs – how the models are trained?

AI tools are typically trained on millions if not billions of words and images. The data are usually taken from publicly available online sources. OpenAI has said that DALL-E2 (a generative AI tool which creates images from natural language descriptions) is trained on hundreds of millions of captioned images from the Internet. Training AI models on publicly available data scraped from the Internet may infringe various IP rights.

Copyright

Training AI models may involve copying various copyright protected works. Copyright subsists in original works constituting their author’s own intellectual creation. The basic principles of copyright protection are similar across the EU. Under Polish law, copyright protects any manifestation of creative activity of an individual nature, in any form, regardless of its value, purpose and method of expression. In principle, copying copyright protected works requires the permission of the rightholder (i.e., the author or any successor in title). Training models on publicly available works, including books, music and art, using training processes which inevitably create copies of the underlying works, may lead to copyright infringement if the rightholders have not provided consent and where no exceptions apply.

Database rights

In addition to copyright protection, there are also separate rights for databases in the EU (sui generis database rights). Database rights subsist if there has been a substantial investment in obtaining, verifying, or presenting the contents of the database, and give the rightsholder the right to prevent the unauthorised extraction and/or re-utilisation of the whole or any substantial part of that database. Training AI models by copying materials sourced from third party datasets may therefore also infringe sui generis database rights.

Text and data mining exception

So, holders of copyright and database rights can prevent unauthorised copying of their copyright works and databases. However, there are various exceptions to this general principle. Of particular relevance in the EU is the “text and data mining” exception under Directive (EU) 2019/790 on copyright and related rights in the Digital Single Market. This exception, which all EU Member States are required to implement into their national laws, and which provides an exception to both copyright and database rights, permits the copying of all lawfully accessible works for the purposes of text and data mining (which is commonly understood to include training AI models). However, this is not an absolute exception. Rightsholders are able to opt-out their works in an appropriate manner, including by using machine-readable means in the case of content made publicly available online (e.g. a robots.txt file).

Confidential information and patents

Many of the more sophisticated chatbots, including ChatGPT, allow users to input significant amounts of text and data as prompts. There is therefore a risk that user inputs, particularly those made by employees using these chatbots in the course of their employment, may include confidential information, such as customer data or proprietary code. Such inputs typically become available to the supplier of the AI platform and may be used by it for further training the AI model. There is therefore a risk that such inputs may become available to other users, including competitors.

This may also impact the availability of patents. One of the requirements for obtaining a patent is that the invention must be novel at the date of filing. So, until the patent application is filed, the invention should be kept secret. While some countries allow a short grace period for filing after a public disclosure by an applicant, not all do. That means that the disclosure of an invention to an AI platform, such as when preparing patent claims, without proper confidentiality obligations in place, could, in the worst case, lead to the relevant patent office refusing to grant a patent for that invention.

Who owns the AI outputs?

The status of the AI generated outputs is of great practical importance for both the providers and users of generative AI systems. In most legal systems, including Poland, works generated solely by AI are not protected by copyright. This is consistent with the approach of the US Copyright Office, which has refused to register works containing solely AI-generated materials. However, some countries, including the UK, do extend copyright protection to “computer-generated” works, i.e., those “generated by computer in circumstances such that there is no human author of the work”.

Moreover, many countries, even those that do not recognise copyright in solely AI-generated works, acknowledge that AI-generated works which are significantly edited or influenced by a human author may be protected by copyright. This leads to a somewhat complex position where the IP protection available for any work is difficult to assess – and in any event, it varies from country to country.

AI Act – EU’s attempt to regulate AI systems

The EU is currently working on an AI Act. The proposed regulation will impose many obligations on providers of AI systems based on the level of risk associated with them. The higher the identified risk level, the stricter the rules. Due to high popularity of generative AI tools, the European Parliament has proposed some additional transparency obligations for generative AI systems, including obligations on providers to publish summaries of copyrighted data used for training and to identify AI-generated content. While this may raise practical concerns for providers, it is a helpful step for IP rightholders in policing their rights in any materials they have opted-out.

What’s next?

Legislators are struggling to keep up with the rapid development of technology in this area. However, the first lawsuits relating to generative AI and IP are already ongoing in various countries around the world. We expect these to clarify many practical aspects of the use of generative AI systems and to be followed by more cases as this technology becomes even more mainstream.