Unlike traditional software solutions, AI models evolve over time, often integrating third-party data and relying on dynamic training mechanisms. If a traditional due diligence approach can be taken regarding software components, a specific approach needs to be adopted because of the AI component of the solution. In practice, this may be a not so simple exercise when the solution is licensed-in to be licensed-out to customers and the model has been pretrained with third party data and may be trained on an ongoing basis over time with multiple data sources including customers’ data.
A fundamental step in due diligence is determining the ownership structure of the AI solution, including its software, models, and training data.
Key questions for IP title assessment
- Has the vendor developed the software and model in-house or commissioned a third party?
- Does the software or model incorporate, or is it built using, third-party code, foundation models, or other third-party intellectual property?
- Does the algorithm, model, or any component leverage open-source components?
Vendors should provide comprehensive responses to confirm ownership or licensed rights over critical components.
Key questions to the extent the whole or part of the AI solution is under license
- Has the vendor the ability to grant a license with a right to sublicense the solution to customers?
- Does such right of sublicense meet customers’ intended purposes?
- Does the license allow customisation of the solution for the customer?
The responses may impact the structure of the contract relationship. Where the vendor offers its solution on a software-as-a-service (SaaS) basis with direct access and use to its solution by end customers from the vendor platform and where third-party components can not be sublicensed, a direct contractual relationship may be contemplated between the vendor and the customer for the purpose of the license. The scope of the license shall also be carefully reviewed to the extent modification/customisation is anticipated.
Key IP consideration on training data
To the extent the AI solution has been pretrained, a thorough due diligence process should be conducted to assess:
- The type of data used for pre-training purposes.
- The sources and methods of obtaining training data eg vendor-owned data, third-party licensed data, open data sets, or data obtained through web scraping.
Specific attention should be given to data obtained through web scraping. In some jurisdictions, the vendor may rely on copyright exceptions such as the text and data mining exception. But in most cases, exceptions are strictly framed and subject to meeting a certain number of requirements. For instance, the text and data mining exception may be subject to an opt-out right for the rights holders and the destruction of all data used for the mining. This requires the vendor to have put in place very robust procedures to ensure that its learning process excludes any risk of infringement during and after the process.
- Where the model has been trained, particularly if it is outside the customer’s home jurisdiction.
- Although the training may have taken place in a more permissive jurisdiction, the deployment and use of the solution in another jurisdiction may raise the risk of infringement.
- The frequency and process for retraining the model, including for purposes of updating training data sets.
Managing input and output data
The nature and rights over the input data should be assessed, including whether it is customer-owned or third-party data.
Further assessment should be conducted in relation to vendor policies and procedures for entering and handling input data:
- How the vendor processes and stores input data, including whether the vendor will segregate the customer’s input data from vendor and third-party data.
- What the vendor’s retention and destruction policies and procedures are for input data.
One should further anticipate both the customer’s and vendor’s expectations about the vendor’s permissible use of input data, including whether the vendor may use input data:
- Solely for the customer’s purposes and benefit.
- To improve the AI tool and benefit other users.
- For any other purpose.
Detailed information about the vendor’s technical processes and contractual requirements should also be requested, for example opt-out procedures the customer must follow for excluding input data from training data sets, if the customer intends to restrict use of input data as training data.
Regarding output data, the nature of the data produced should be assessed and, more specifically, whether the output data is newly generated content or a response to input data based on predefined rules such as a prediction, recommendation or decision. Specific due diligence will also be conducted to determine each party’s expectations around intellectual property rights in the output data and permitted uses and where such data will be used or distributed, particularly if outside the customer’s home jurisdiction.
Business continuity: addressing vendor insolvency risks
The solution will require ongoing maintenance, updates, and support. A key risk is vendor insolvency, which could leave customers without access to the solution. To mitigate this risk, due diligence should explore whether an escrow agreement is in place, ensuring continued access in the event of vendor failure. The agreement should specify:
- When and how customers can access the source code and relevant documentation.
- Which usage rights are then granted.
- Who bears the cost of maintaining access and updating the solution.
The due diligence process should further anticipate the technical capabilities, either internal or external, which will be required once access has been secured. Ultimately, the IP due diligence aims to: