1 of 4

3 October 2023

Open Source Software – 1 of 4 Insights

Time to adapt your Open Source Software policies to include AI code assistants’ output!

Lucas de Groot and Martijn Loth look at the impact of AI code assistants on Open Source Software.

More
Authors

Lucas de Groot

Associate

Read More

Martijn Loth

Counsel

Read More

To safely use AI code assistants, businesses developing software will need to vet the output of these code assistants using the same quality assurance mechanisms and policies adopted for freely available Open Source Software (OSS).

The use of Generative AI tools (GAI) – tools that are able to take simple prompts and use them to produce seemingly useful content, such as text, images or audio, has been, and still is, on the rise because of the purported efficiency and productivity gains. Many of these tools are also freely available for businesses to use - making their adoption both functionally and financially attractive.

One category of GAI that is particularly interesting to businesses developing software is AI code assistants which have the ability to generate or auto-complete source code based on a high level description of what is desired (i.e. the prompts) and that take into account context, such as function and variable names, arguments required, other open files in the IDE, etc. Frequently seen examples of AI code assistants include Github Copilot, AskCody, Sourcegraph, Tabnine, Replit, and CodeWhisperer. Unfortunately, while the output of AI code assistants often looks original and safe to use immediately, this may not always be the case.

Similar to other types of GAI, AI code assistants are only capable of predicting useful source code if the model behind it has been properly trained. Training AI code assistant models requires huge amounts of source code and vendors of closed source software are unlikely to be willing to participate without significant financial compensation. Given that OSS is already easily and freely available at repositories such as Github and SourceForge, it is understandable that these OSS repositories are being turned to by vendors of AI code assistants to gather the necessary training data.

It is not, however, clear to what extent and under which circumstances the output of each AI code assistant regurgitates parts of the OSS seen during its training. Nor has it been settled in all jurisdictions the extent to which this practice is permitted. In the US, the class action of November 2022 at the US District Court for the Northern District of California initiated by anonymous developers against Microsoft Corporation, Github and co. is based on allegations that the AI code assistant offers identical or near-identical reproductions of code scraped from public sources. More importantly, the suit argues that both the training (by Microsoft, Github, and co.) and the usage (by their customers) is unlawful.

Given this, businesses wanting to incorporate the output of AI code assistants would be wise to assume their output  contains OSS unless they have received assurances to the contrary. Fortunately, most businesses developing software will likely already have an OSS policy that governs the inclusion of OSS in other software projects.  Most probably they will just need to amend the scope of that policy to include these AI code assistants, and implement an Acceptable Use Policy for GAI.

Those new to the OSS game should be aware that, although the term ‘open source’ suggest that OSS may be used freely without restrictions, this is usually not the case. OSS can only be used in accordance with the terms and conditions of its licence with licences often categorised on a scale from “permissive” (ie having few restrictions and requirement, such as simply maintaining attribution) to “restrictive” (ie having more elaborate requirements that often impact commercial freedom, such as requiring modifications to the OSS to be distributed under the same OSS licence). And the licence often applies to “derivatives” – modifications to the original OSS - as well.

Compliance issues that we have observed in practice include: unknowingly committing to an obligation to make all changes to the OSS available downstream together with the original software licence (and not doing so) and then later being faced with the threat of enforcement actions by the OSS’ authors; having proprietary code get (allegedly) ‘infected’ by an OSS licence; and claims for compensatory damages in the event of serious malfunctioning software. All these issues are likely to raise a red flag for potential investors and buyers of technology projects and may be used to lower the sale or share price.

What do you need to do?

Make sure you review and comply with the licence terms for OSS, but be aware that few AI code assistants (at the time of writing, although Copilot appears to be working on a solution) offer attribution, let alone enable easy access to applicable licence terms. When using remotely hosted AI code assistants (e.g. through API calls while using a particular plugin in your IDE), be aware that your own proprietary code may also become part of training data depending on the AI code assistant’s terms of service.

In addition to the Acceptable Usage Policy for GenAI or an amended OSS Policy, consider the following measures:

  • developer awareness training on common licensing requirements and pitfalls
  • only using AI code assistants that have been trained on internal (or otherwise vetted) source code
  • using filters that prevent the output of code that is identical to code available at the major repositories
  • regularly scanning your entire code base with software composition tools to detect the presence of undetected OSS, such as Black Duck.

At Taylor Wessing, we have extensive experience in drafting the required policies, training developers on licensing requirements, and offering strategic advice during due diligence processes, so feel free to reach out to discuss further.

Return to

home

Go to Interface main hub