2024年5月13日
The EU AI Act was meant to become the first attempt of a regulator to set guardrails for an evolving disruptive technology. The underlying concept is a risk-based approach and the concerns were mostly product safety considerations energizing a generation of legislators which grew up with movies like Stanley Kubrick’s “2001: A Space Odyssey” and sequels like “The Terminator” or “The Matrix” - not to forget … nope, we’re getting off track in the first paragraph.
And while those legislators were diligently carving out principles to save us from HAL 9000 and Arnie, Chat GPT made an unexpected surprise appearance in November 2022. Jeeeez, we need to inject some copyright rules. We then had a year of legal frenzy, in which the US stakeholders reacted with court actions on copyright aspects being launched, China amended its broad set of AI regulation per the “Measures for the Management of Generative Artificial Intelligence Services 2023”, and the EU hastily included provisions on the regulation of generative or “general purpose” AI (“gpAI”) touching inter alia some copyright aspects.
When looking closer at copyright in the context of AI, the “usual suspects” include usage of works for model training (1), protectability of input and generated output (2), how to deal with the risk of possibly infringing output (3). Here is our update on the occasion of the entry into force of the EU AI Act, which is expected during the next months:
Human rationale relies on gathering actually available information, processing it against relevant memories saved in the brain and generating a suggestion as result of such analysis. More experiences made should improve the quality of the suggestion. Same applies to AI needing large amounts of machine-readable content for breeding and training. The EU Digital Single Market Directive 2019/790 of 17 April 2019 (“DSM Directive”) had enacted text and data mining exceptions to the reproduction right, i.e. for the purpose of scientific research by research and cultural institutions in its Art. 3, and for anyone in its Art. 4 including, however, a possibility for rightsholders to ‘opt-out’. After the launch of ChatGPT, there have been controversial discussion on whether or not Art. 4 DSM Directive covers using in particular works made available on the Internet for training AI models as such training includes to generate at least temporary reproductions thereof. Representatives of authors and the creative industries had inter alia raised concerns how and whether it would be possible to verify whether or not the model trainer has respected the limits of the text and data exception.
The AI Act (in the current form as approved by the European Council on 14 May 2024) does not include new provisions on exceptions to the copyright, however, the wording in item 105 of the recitals is crystal clear on this point: The use of copyright protected content requires the authorization of the rightsholder unless an exception applies. The text and data mining exceptions of the DSM Directive are per se applicable. Should the rightsholders have decided to reserve their rights to prevent text and data mining, model providers need to obtain their authorization for using such protected content for text and data mining purposes.
Nevertheless, the AI Act has also dealt with the actual problem of how to verify compliance: All providers of general-purpose AI models must comply with the documentation obligations set out in Art. 53 (1) AI Act and Annex XI. This also includes a documentation duty comprising the training and testing process requiring, amongst others, a detailed description on “information on the data used for training, testing and validation, where applicable, including the type and provenance of data and curation methodologies (e.g. cleaning, filtering etc.), the number of data points, their scope and main characteristics; how the data was obtained”. The provider must draw up and keep up to date such documentation and provide it, upon request, to the AI Office and the national competent authorities.
In order to accommodate the concerns of the authors, the AI Act includes two more specific requirements providers have to comply with: Under Art 53 (1) lit. (c), they must establish and run a copyright policy including state-of-the-art technologies to identify and comply with possible opt-out reservations of authors pursuant to Article 4(3) of the DSM Directive. Furthermore, Art. 53 (1) lit. (d) AI Act obliges providers to “draw up and make publicly available a sufficiently detailed summary about the content used for training of the general-purpose AI model”. To ensure a reasonable and uniform standard thereof, the future AI Office is commissioned to provide a template for such summary.
When this documentation concept first came out, it was argued that it puts a mark on the back of the AI model developer. The AI Act’s approach of confirming the applicability of the text and data mining exception on the one hand while introducing documentation duties on the other hand is a compromise: AI models only work if they have been trained, and some of the training material is protected by copyright.
What the “sufficiently detailed summary” shall look like will be determined by a template to be developed by the EU’s AI Office. Recital 107 indicates that the summary should be generally comprehensive rather than technically detailed, e.g. by listing “the main data collections or sets that went into training the model”. Before the template is available, providers may be well advised to develop industry best practices. It will also be interesting to see how the documentation of training materials used in the past will be achieved going forward.
Cases around whether or not the actual training of AI models took place within the limits set by the text and data mining exception will soon clarify the details of the copying which took place during the data preparation and the training of the model. Once the facts are established, the parties are likely to concentrate on arguing why such copying is either still transient and/or temporary to only gather the underlying and non-protectable concept and idea of the work and/or the opposite: that the copying was exceeding the limits of the narrowly to be interpreted exceptions sucking out the essence of the protected work.
In a court case currently pending before the Hamburg regional court, a stock photographer is suing the non-profit organization LAION, which offers the LAION-5B dataset used for the training of large image-text models. The lawsuit alleges unlawful copying and aims to have the images removed from the training set. LAION in contrast relies particularly on the general text and data mining exception under Art. 4 DSM Directive, but also on the text and data mining exception for purposes of scientific research under Art. 3 DSM Directive, which does not provide for an ‘opt-out’. The oral hearing is expected in July 2024.
There are exceptions for open-source AI being, however, of a limited practical relevance: The obligations from Art 53 (1) lit. (a) and (b) AI Act do not apply, should the provider offer the GPAI model under a “free and open-source license”, Article 53 (2). This relief does not apply should the GPAI model qualify to fall in the category of models with systematic risks. Furthermore, recital 103 specifies that the exceptions only applies if the model is not provided against a price or other monetization including the use of personal data.
Recital 106 deals with the international dimension of the text and data mining exception and/or fair market conditions: Providers that put AI on the market in the EU should ensure compliance with the relevant obligations in the AI Act. In particular, any provider placing a gpAI model on the EU market should comply, regardless of the jurisdiction in which the copyright-relevant acts took place. Thereby, the AI Act aims to ensure a “level playing field” and prevent advantages through the application of lower copyright standards that may exist outside the EU - an approach that has already been criticized as being a long-arm approach going beyond current copyright rules. As compliance with the requirements will be monitored, see e.g. Art. 89 AI Act – attempts to circumvent compliance requirements may be uncovered.
The AI Act does not touch issues around protectability at all.
The traditional standard under the copyright concept provides protection for creative results achieved by human beings. Consequently, human-made prompts may enjoy copyright protections e.g.as literary work.
As skillful prompting is valuable, users better check whether or not the AI system provider reserves the right to use the content of prompts for its own purposes including the further refinement of the AI system. The output generated e.g. by the use of a general-purpose model will in most cases probably not enjoy copyright protection. There are attempts to argue that there is no difference between using a camera for creating a photographic work and using an AI system for creating respective results. We do not believe that such arguments have considerable punch as the level of involvement contribution and influence on the creative processes is simply not comparable. AI output stems from probability calculations taking place within a black box system at a speed exceeding human perception. Non-protectability of the output would seem in line with the current setting in the United States (see US Copyright Office (USCO) in its February 2023 decision re Zarya of the Dawn, or District Court for the District of Columbia in Thaler v. Perlmutter). Relevant criteria applied by the USCO include asking whether the ‘work’ is basically one of human authorship, with the computer merely being an “assisting instrument”, or whether the traditional elements of authorship in the work (literary, artistic, or musical expression or elements of selection, arrangement, etc.) were actually conceived and executed “not by man but by a machine.” In the case of works containing AI-generated material, the Office will consider whether the AI contributions are the result of “mechanical reproduction” or instead of an author’s “own original mental conception, to which [the author] gave visible form.” In China, there seems to be an opposite trend accepting the eligibility of output for copyright protection (Beijing Internet Court, Lee v. Liu).
In the EU, discussions have already started on whether or not there should be some level of protection for valuable output. It could e.g. come from a new neighboring right, as the concept of the latter does not necessarily require human creation driving its generation. Under current law, certain AI generated output may already be protected by existing neighboring rights, such as phonograms or sui-generis databases.
Press reports about the actions launched in the US in 2023 seem to indicate that some output generated in particular by image creating AI systems bears features being surprisingly similar to works of third parties.
The AI Act does not add any specific rule on infringements beside of the already mentioned duty to put in place a policy to comply with Union copyright law. The EU legislator emphasizes in recital 108 that the AI Office should monitor the providers’ fulfillment of those obligations without diving into a work-by-work assessment of the training data in terms of copyright compliance. This appears to become a brief formalistic check so that the enforcement continues to be mainly driven by rightsholders’ actions.
So far, there is no general standard of care about avoiding infringement. Some system providers are shifting the entire responsibility and risks on the users. There is some logic in this approach, at least to the extent that it is surely possible to provoke infringing output via purposive wording of prompts. Other providers are offering duplication filter functionalities which shall avoid infringing output. If activated, those providers are promising to support users being admonished for infringing third party rights because of the so-generated content. In addition, we see some AI providers offering indemnities to their users in the event of infringing output, although with some limitations.
As there is not yet relevant case law on infringing generative AI output available, this leaves a number of questions open: How will existing tests for infringement apply in the context of (generative) AI? Can existing defenses to copyright infringement be applied? Who is liable – the user, the AI developer, or both?
So, the AI Act does surely not provide a comprehensive regulatory regime for the tensions involving AI technologies and copyright. Many details to achieve a reasonable balance between the opposing interests will have to be clarified by courts. On the home stretch, the AI Act grabbed some concepts on the applicability of text and data mining exception and documentation duties which should facilitate to solve the remaining legal topics. Finally, it is worth taking a brief look at the effectiveness provisions: The bulk of rules will apply two years after the EU AI Act’s entry into force (see Art 113), presumably in the summer of 2026. The gpAI rules are on a fast track and will apply already from 12 months of entry into force. GpAI models placed on the market before 12 months from the date the AI Act comes into force will be given a 36 months grace period for achieving compliance (Art 111 (3)).