Individuals and businesses are increasingly using generative AI tools to perform a range of functions, from improving operational efficiency to generating content that is monetised as part of their products and services. As the use of generative AI proliferates, so too do associated legal disputes. Getty Images has started proceedings against Stability AI in the US and UK alleging that AI image generation tool Stable Diffusion infringes its copyright and trade marks. GitHub, Microsoft and OpenAI are fighting a class action in the US relating to the GitHub Copilot tool's generation and attribution of open source software. This article examines the IP ownership and infringement implications of the use and provision of generative AI services, and provides key points to consider when training or using generative AI tools.
At a general level, sophisticated generative AI models are trained on large datasets (eg of text, images, music, or software) and learn to recognise highly nuanced patterns and relationships within the training data. In theory they should not memorise the training data (eg strings of text or images), but only relational principles present in the data. When responding to a prompt, the model uses interpolation to generate a response based on its learned relational principles. In theory, the output should be a wholly new output generated from scratch by the model but in practice, it can in some instances be similar or identical to material present in the training data.
Ownership of the AI program
There are a multiplicity of stakeholders involved in the creation of an AI system (eg programmers, data suppliers, trainers, feedback providers, investors funding creation of the system, and system operators). In many cases, the AI system software programmers will likely be seen as the first authors of the copyright in the AI system as a computer program, but questions of co-authorship could arise depending on the contributions of other individuals involved.
Ownership of the AI-generated output
Ownership of copyright in an AI program may not automatically result in copyright ownership over the future output created by the AI system (assuming copyright subsists in that output). Ultimately, who owns any copyright in the AI-generated output is a complex question and still unsettled in the UK.
The UK currently provides copyright protection over computer-generated works (CGW), ie works generated by a computer in circumstances where there is no human author of the work, subject to originality requirements. The author for copyright ownership purposes is the person by whom the arrangements necessary for the creation of the work were undertaken.
It is unclear whether AI-generated works fall firmly within the definition of CGWs. The government acknowledged this uncertainty and sought to address it through its consultations into AI and copyright in 2021 and 2022. It ultimately decided not to table immediate changes to the law regarding authorship because the use of AI is still at a nascent stage. It is possible the growing mainstream appeal of generative AI like ChatGPT could compel a reconsideration of current laws. For now, in the absence of any clear alternative, the best analysis in most cases is likely to be that AI-generated works fall within the definition of CGW and are protected by copyright on that basis.
Who precisely is the "person by whom the arrangements necessary for the creation of the work were undertaken" in the context of AI-generated output is not clear cut.
For corporate users who invest in the development of their training model to generate works for their own purposes and employ the software developers who create the code (eg a games developer using an AI system to generate images for its virtual world), it is more straightforward to identify them as having made the necessary arrangements.
Where there is arguably more than one "contributor" to the arrangements necessary for the work's creation, eg users who input into the creation of works by publicly available systems, the question is much less straightforward. Here, the user could, depending on their contribution, also claim authorship.
The answer may depend on the level of creativity contributed by the AI creator and the user. Where the AI for the most part automatically produces the generated works creatively and independently, with minimal human intervention (beyond inputting simple prompts), the output may more likely be owned by the AI creator. Where the generation of the new works is AI-assisted (ie AI functions as a tool to enable a user to achieve a particular result), but considerably more human intervention is necessary, the output may more likely be owned by the user.
Regardless of the default legal position, ownership of AI-generated outputs is, in practice, often determined by the AI service provider's terms and conditions. Conflict of laws considerations can also play a part in the analysis where the AI user and AI service provider are located in different jurisdictions.
The training and use of generative AI models can give rise to a variety of IP infringement risks at various different stages of the process:
Stage 1: Obtaining the training data to train the AI
If the training data is obtained from unlicensed sources, eg by scraping, and consists of copyright-protected works eg artworks, music, videos, strings of text, the copying/reproduction of the data without a licence can infringe copyright. There can also be infringement of database rights if the data is extracted from a protected database.
Potential "non-commercial use" exception: Exceptions to copyright and database rights infringement may exist if the use is for non-commercial research or non-commercial text and data mining. However, any research that is used for a purpose which has some commercial value would not benefit from these exceptions.
Proposed text and data mining exception: The government has decided not to proceed with a proposed new copyright and database right exception, which would have permitted text and data mining for any purpose (including commercial ones).
Stage 2: Training process
During the training process, the training data may be stored, potentially in different formats, for the duration of the training, which may take several months. The making of new copies of the data for these purposes, including in encoded or compressed forms, without a licence could constitute further acts of copyright infringement.
The AI Model is also likely to make temporary copies of the training data in its own memory while 'reading' it which may also constitute copyright infringement.
Potential temporary copies defence: This defence applies to the making of copies that are transient or incidental and an integral and essential part of a technological process, the sole purpose of which is to enable a lawful use of the copyright work and which have no independent economic significance. This exception has not been tried in the UK courts in relation to the AI training process but could potentially apply depending on the circumstances.
Stage 3: Storing 'learned information'
If the AI creates and stores copies of the training data or parts of it, including as compressed versions of the original, rather than merely learning relational principles present in the training data, this could amount to another act of copyright and/or database rights infringement.
Even if an AI model stores information in an abstract form, provided the neural network is capable of reproducing a substantial part of the creative original elements of a copyrighted work in the training dataset, this may amount to copying in a manner analogous to storing content in a compressed file format. This can occur as a result of the way the AI is trained or as a result of a poor-quality training dataset, eg that has not been sufficiently de-duplicated.
Stage 4: Generation of AI-output
If the AI-generated works replicate a "substantial part" of a copyright work contained in the training data, or "substantial part" of a database from which training data was obtained, the creation and use of those works could amount to copyright and database rights infringement. This can occur if the AI essentially creates a digital collage of copied parts of the training data to create a "new" work, or where the model has 'memorised' the training data rather than learning more generalised principles from it.
There can potentially be copyright infringement even if no part of the original training data is replicated exactly but the AI-generated work as a whole gives a sufficiently similar impression to a work included in the training data (a non-literal copy).
There may also be a risk of trade mark infringement and/or passing off if the AI-generated output includes trade marks (eg logos) in a manner that gives rise to a likelihood of confusion or may lead the public to believe the output is somehow associated with or endorsed by the trade mark owner.
Stage 5: Onward use of AI-generated works
If an infringing AI-generated work is subsequently used, eg by offering copies for sale, posting online, or performing in public, this could amount to further acts of IP infringement.
Using AI services
Providing AI services
Given the complexities of generative AI systems and the, in some cases, uncertain or untested application of IP laws to their output, it's important to consider the issues carefully and remain aware of incoming laws and relevant court decisions in this space.
Debbie Heywood (not ChatGPT) looks at the evolution of the UK's policy on regulating AI.
1 of 6 Insights
Benedikt Kohn and Fritz-Ulli Pieper look at the approach to regulating AI in key jurisdictions.
2 of 6 Insights
Katie Chandler, Philipp Behrendt and Christopher Bakier look at the EU's proposals to legislate for liability risks in AI products.
4 of 6 Insights
Thorsten Troge looks at regulation of dark patterns in the EU and at whether this is sufficient as they become increasingly AI-driven.
5 of 6 Insights
Nicholas Vollers and Alison Dennis compare and contrast the UK and EU approaches to regulating the use of AI in healthcare.
6 of 6 Insights