2025年5月22日
The EU Intellectual Property Office (“EUIPO”) has recently published an in-depth study on the various relations between generative artificial intelligence (“GenAI”) and copyright law. The study summarizes many of the legal issues currently under discussion. It also provides a detailed overview of possible technical solutions for (1) declaring an opt-out against AI training and (2) marking GenAI output in a machine-readable format and detectable as artificially generated.
According to EU Copyright Law, GenAI training is generally permitted as “text and data mining” (“TDM”) unless the use of works has not been expressly reserved by their rightsholders (so-called opt-outs). Such reservation must be declared in an “appropriate manner”, such as “machine-readable means” in the case of content made publicly available online.
In addition to that, providers and deployers of most GenAI models will soon have to comply with transparency obligations under the EU AI Act. In particular, providers of GenAI systems will have to ensure that the outputs of the AI system are marked in a machine-readable format and detectable as artificially generated or manipulated.
Neither EU Copyright Law nor the EU AI Act specify how these requirements can be achieved in practice. It comes as little surprise that there is currently extreme legal uncertainty surrounding both requirements.
TDM plays a central role in developing GenAI models. However, there is no established market standard on how to exercise TDM opt-outs so far. This creates enormous challenges for rightsholders, providers of training datasets or GenAI developers.
The EUIPO compares several technical solutions and makes a differentiation between legal and technical opt-outs. In a nutshell, the EUIPO provides a sophisticated comparison with over 17 different criteria such as typology of the opt-out based on location/file/work/repertoire, versatility, robustness, ease of implementation and offline/online applicability. The EUIPO considers none of the compared solutions as ideal, especially due to their inherent lack of enforcement.
According to the study, the Robots Exclusion Protocol (“REP”) can be currently considered a de facto standard to restrict web-scraping. However, REP is only considered a temporary solution for TDM opt-outs by the EUIPO, mainly because of its limited granularity and use-specificity.
Legally driven measures include unilateral opt-out declarations such as simple letters from rightsholders or databases including digitalized declarations, licensing constraints, and website terms and conditions.
Technical measures include the TDM Reservation Protocol, the C2PA Training and Data Mining Assertions, the JPEG Trust Rights Declaration, Spawning.AI’s solutions, the use of Liccium Infrastructure, and the Valunode Open Rights Data Exchange.
Compared to above opt-out mechanisms, the requirements for transparency specifications are even newer and therefore less examined. The study provides helpful guidance with differentiating AI output such as music, books and images. In this regard, the EUIPO explains two different approaches to transparency solutions. Examples of these techniques are compared in detail based on ten criteria, including market maturity, cost implication, robustness, interoperability, scalability and reliability.
Provenance tracking is an approach that seeks to certify the entire lifecycle of a digital asset to ensure a reliable record of the asset's history. This history is often encoded in a machine-readable format into the content’s metadata. Examples are C2PA, IPTC Photo Metadata Standard, JPEG Trust or Trace4EU.
Content-Processing Solutions include solutions such as watermarking and fingerprinting. This approach refers to the fact that information is directly included in the digital content itself. Fingerprinting identifies unique patterns in content to detect copies, while watermarking embeds provenance information to prevent or trace unauthorized use.
In addition, the study provides information on how to detect AI generated content and whether a GenAI model has been trained on a specific data point. While this is not related to transparency measures, this will play a critical role in safeguarding consumers against deception and discovery on the training data. A possible solution for detection would be Nvidia’s StyleGAN3 detector.
As said, the study does not give answers to legal questions. However, in certain areas, the study is a gold mine for learning about the current market standard for technical solutions. Many other important topics are also addressed. Notably, the study provides an overview of the following: (1) the developing licensing market for training data, (2) AI court proceedings, (3) other copyright-related issues, such as content memorization inside the model and overfitting output.