Five takeaways on general purpose AI from the leaked AI Act

Briefing

Earlier this week, two documents were leaked which are said to include the latest draft of the EU AI Act. The final wording has been “under construction” since the EU Parliament and the EU Council had announced their political agreement on 9 December 2023. The last version circulated was the one that included the amendments adopted by the European Parliament on 14 June 2023: the first attempt to deal with the game changer resulting from the launch of Chat GPT on 30 November 2022.

At that time, these amendments were obviously knitted with a hot needle - trying to prevent that one of the regulatory model projects of the EU would not end to be outdated from the outset. In parallel, the US saw a first wave of claims from rightholders suing various AI providers mainly for copyright infringement arising from the alleged use of their works during the training of their AI systems. One of the major defences raised focused on a copyright exception for text and data mining resulting from the implementation of the EU’s Digital Single Market Directive 2019/790 of 17 April 2019. According to its Article 4, the use of copyright protected content for text and data mining purposes is permitted, unless its rightholders have expressly reserved the use of their content in an appropriate manner (“opt out”), such as machine-readable from in the case of content made publicly available online. The rightholders doubted that the exception would apply and had engaged legal experts to prepare and publish arguments as to why the training of AI models should not be covered by the text and data mining exception.

The now leaked text is said to be pre-final and a decision on it in the Council is scheduled for 2 February. Of course, there are gaps here and there that will be filled in during the final editing process. Nevertheless, it contains significant changes showing that a lot of sweat-of-the-brows has gone into it. Below, we would like to share some impressions on the regulation of “general purpose AI” (“GPAI”), which has been at the centre of public attention:

1. GPAI models qualify at least as limited risk AI

Following the logic of a risk-based regulation with four buckets, GPAI models have now been placed in the third bucket, which includes the limited risk products which are subject to specific transparency obligations as set out in Art. 52. This classification puts an end to the controversy as to whether or not GPAI models should in principle be classified as high risk. However, a new subcategory has been inserted – “GPAI models with a systematic risk”. There are two alternative classification criteria in Article 52a both focusing on the “high impact capabilities of the GPAI model”. Such quality must be evaluated. It is presumed when the cumulate amount of compute used for training measured in floating point operations exceeds 10^25. Lovers of due process may wish to delve into the details of the procedure set out in Article 52b, which we won’t cover here. GPAI models with a systematic risk must comply with further obligations set out in Article 52d which aim, among other things, to identify and mitigate these risks and to ensure adequate cybersecurity protection.

2. The text and data mining exception covers the training of AI models

Although the AI Act will not include new provisions on exceptions to the copyright, the wording in Item 60i of the recitals is crystal clear on this point: The use of copyright protected content requires the authorization of the rightholder unless an exception applies. Directive 2019/790 introduced the text and data mining exception under certain conditions including that the rightsholders could reserve their rights to prevent text and data mining. If they opted-out accordingly, GPAI model providers needed to obtain their authorization for using such protected content for text and data mining purposes.

3. Document your training!

All GPAI models must comply with the obligations set out in Article 52c, incorporating concepts introduced by the EU Parliament in the former Article 28b on foundation models, such as the documentation requirement set out in its paragraph (1) thereof. This includes documentation of the training and testing process of the model. The minimum requirements are further specified in Annex IX a, and comprises “information on the data used for training testing and validation … including type and provenance of data and curation methodologies (e.g. cleaning, filtering etc.)” as well as “how the data was obtained and selected”. Furthermore, a copyright policy must be put in place to ensure that the counter-exception to the text and data minimum exception can be identified and respected.

When this documentation concept first came out, it was argued that it puts a mark on the back of the AI model developer. The AI Act’s current approach of confirming the applicability of the text and data mining exception on the one hand while introducing documentation duties on the other hand appears to be a compromise: AI models only work if they have been trained, and some of the training material is protected by copyright. If such use is - as an exception to the principle of the rightholder’s decisiveness – is permitted, the compliance with the limits of such exception should be verifiable. The “mark-on-the-back” argument only becomes true if the model developer has not trained its model within the limits of the exception. The interest of the affected rightholders in fighting an infringement of their rights as well as the interest of compliant developers to compete on fair markets are, however, worth to be protected.

It is worth taking a brief look at the effectiveness provisions: The standard for the application is two years after entry into force, Article 85 (2), so presumably the summer of 2026, should the AI Act be passed by the outgoing EU Parliament within the next few months. The GPAI rules are on a fast track, as Article 85 (3) provides for their application within for 12 months of entry into force. GPAI models already on the market when the AI Act comes into force will be given a two-years grace period for achieving compliance. It will be interesting to see how the documentation of training materials used in the past will be achieved. The text and data mining exception does not seem to allow the making of permanent copies thereof (“may be retained for as long as is necessary for the purposes of text and data mining”) Article 4 (2) Directive 2019/790.

4. Don’t try to circumvent that

Recital 60j deals with the international dimension of the text and data mining exception and or fair market conditions: Don’t cheat by trying to place models on the EU market which disregard this minimum standard of protection by arguing that your AI model has been trained in another jurisdiction with lower copyright standards. This is not a long-arm approach dictating a global standard but just the old adage: If you want to play in my backyard, you play by my rules. At least if the scope of this concept only applies to data directly or indirectly originating from the EU.

And it is precisely in this context that the importance of the documentation requirements becomes clear: anyone who secretly trains in another jurisdiction and then tries to conceal this must nevertheless submit training documentation that bears their model. As compliance with the requirements will be monitored - the draft AI Act contains seven separate provisions on this under Article 68a, currently referred to as Articles A to H – circumvention measures will probably be uncovered.

5. Open source privileges

Some of these obligations do not apply should the provider offer the GPAI model under a “free and open [source] license”, Article 52 (-2). According to recital (60i+1), the AI Act features a narrower understanding of free and open-source as usual: It clarifies that such ease only applies if the model is not provided against a price or other monetization including the use of personal data.

So much for today. There is a lot more to cover, we’ll come back with more soon.

Services and Groups Information technology

Sectors Technology, media & communications Artificial intelligence & machine learning

Hot Topics Artificial Intelligence Act