Background
Under the European AI Act (Regulation (EU) 2024/1689, “AI Act”), providers of General-Purpose AI (“GPAI”) models, such as models of the GPT family, Llama or Gemini, must comply with certain requirements, such as drawing up documentation and putting in place a policy to comply with EU copyright law.
To make compliance with these requirements easier, the AI Act foresees the creation of Codes of Practice for the use of GPAI models. On invitation by the AI Office, various experts and stakeholders set up four working groups to draft a first Code of Practice. If the EU Commission approves this Code of Practice, it will have “general validity” within the EU. By adopting the approved GPAI Code of Practice, companies can demonstrate proactive compliance, potentially avoiding regulatory scrutiny and penalties (more information on the role of the Code of Practice).
The AI Office has now published the working groups’ third draft of the Code of Practice (“3rd Draft”), covering the following topics:
- Commitments
- Transparency
- Copyright
- Safety and Security
The final version of the Code of Practice is scheduled for 2 May 2025.
Below we will discuss important details for the copyright section of the 3rd Draft. Compared to the previous second draft, the 3rd Draft has been streamlined and shortened. In particular, the 3rd Draft – contrary to the second draft (“2nd Draft”) – generally requires that compliance should be proportionate to the size and capacities of the provider.
Who is this relevant for?
The Code of Practice is primarily relevant for providers of GPAI models. GPAI models are models that display significant generality and are capable of competently performing a wide range of distinct tasks. These may be providers of the well-known large language models such as GPT (OpenAI), Llama (Meta), Gemini (Google) or Mistral (Mistral AI). But also smaller model providers may be affected as long as their models can be used for a wider range of tasks. Also, businesses that fine-tune models for their own purposes may become GPAI model providers.
Additionally, “downstream providers”, i.e. businesses that implement GPAI models into their AI systems, should familiarize themselves with the Code of Practice. The Code of Practice may become a quasi-standard for GPAI models as to what AI system developers can expect and not expect of a GPAI model. This may be considered when negotiating contracts with GPAI model providers.
Key concepts of the Code of Practice on copyright law
Providers of GPAI models are bound to put in place a policy to comply with EU copyright law (Art. 53 (1) (c) AI Act). Since there has not been any similar requirement so far, there is no practical guidance on what such a policy should look like. The Code of Practice is intended to fill this gap.
The Code of Practice requires providers to implement the following measures:
Providers signing the Code of Practice (“Signatories”) must draw up, keep up-to-date and implement a copyright policy on compliance with EU copyright law. This is already directly required under the AI Act. Signatories shall also ensure compliance with the copyright policy within their organization.
As an important change under the 3rd Draft compared to the 2nd Draft, Signatories are no longer required to publish the copyright policy but are only encouraged to do so. The lower requirement makes sense, as the AI Act does not oblige model providers to publish their copyright policy either.
Signatories are generally allowed to use web crawlers for the purposes of text and data mining (“TDM”) to obtain training data for their GPAI models. However, they shall ensure that crawlers respect technologies restricting access to copyrighted materials, such as paywalls.
Additionally, Signatories are required to exclude so-called “piracy domains,” i.e., internet sources that make a business out of providing copyright-infringing materials.
Signatories shall ensure that web crawlers identify and comply with a TDM opt-out declared by rightsholders. TDM is generally allowed under EU copyright law, but rightsholders may declare to opt-out. For web content, the opt-out shall be machine-readable. The 3rd Draft specifies the requirements for web crawlers by stating that they shall identify and comply with the widely used robots.txt protocol. Additionally, web crawlers must adhere to other relevant machine-readable TDM opt-outs such as metadata established as an industry standard or solutions widely adopted by rightsholders.
Signatories shall take reasonable measures to inform rightsholders about the web crawlers in use and how these crawlers deal with a robots.txt. The information can be disseminated, for example via a web feed. Notably, the 3rd Draft does no longer contain an obligation to publish this information.
GPAI model providers may also obtain datasets from third parties instead of applying web crawling themselves. While the 2nd Draft requested undertaking a copyright due diligence of third-party data sets, the 3rd draft requires to make reasonable efforts to obtain information as to whether web crawlers used to gather the information complied with a robot.txt.
A risk when using AI is that the AI may generate output that infringes copyrights, e.g. by duplicating code or a picture that was found online but is subject to copyright protection, nonetheless.
Signatories shall make reasonable efforts to mitigate such a risk. This is a welcome relief compared to the 2nd Draft, where measures were prescribed to avoid “overfitting”. The draft is now technically more neutral and references reasonable efforts.
Additionally, Signatories shall include a passage in their terms and conditions (or similar documents) for providers of downstream AI systems to prohibit a copyright-infringing use of their GPAI model.
Signatories must provide a point of contact for rightsholders. Additionally, they must implement a mechanism to allow rightsholders to submit complaints about copyright infringements.
Under the 3rd Draft, Signatories may refuse to process complaints that are unfounded or excessive.
Conclusion and recommendations for businesses
The 3rd Draft, compared to the 2nd Draft, holds some reasonable changes that allow businesses to comply with the Code of Practice in an adequateway. This is likely to make it more practicable for businesses to actually use the Code of Practice to comply with the AI Act.
Yet, it must be understood that the Code of Practice still is only a draft and may be subject to substantial change. It is likely but not guaranteed that the EU Commission will approve the final Code of Practice.
The working groups will now receive feedback from stakeholders until 30 March 2025 and present a final version in May 2025.