10 juin 2026
The use of AI-powered coding tools (code completion, chat-based assistants, agents, and other tools) has rapidly evolved from an “experiment” to a productivity-enhancing standard in software development. In tech stacks heavily reliant on open source, this shifts the risk profile: It is not the OSS component itself that is new, but the additional source of “AI-generated code” as potentially non-transparent third-party content - with implications for copyright, licensing, trade secret protection, and security.
As with traditional OSS usage, the following applies: There is no legal vacuum; usage and compliance requirements must be managed both organizationally and technically. Robust OSS governance is the foundation for this.
Current real-world cases demonstrate that this is necessary - in the “Chardet case,” which has become almost famous in a short time:
March 2026: The long-time maintainer of an LGPL-licensed Python library had the entire codebase regenerated using Claude Code in about five days and published the result under the permissive MIT license. According to JPlag analysis, the structural similarity to the previous version was less than 1.3%. The original author disagrees: This is not a “clean room” implementation, as the maintainer had years of access to the LGPL code and the LLM demonstrably accessed metadata from the LGPL version.
The Free Software Foundation clarifies: “There is nothing ‘clean’ about a Large Language Model which has ingested the code it is being asked to reimplement”, The Software Freedom Conservancy announced a formal analysis on March 27, 2026.
In academia, this phenomenon is already referred to as “copyleft laundering” - the systematic circumvention of copyleft obligations through AI-assisted reimplementation.
This episode illustrates impressively: Companies that use OSS components must now verify not only the provenance of their own AI output but also the upstream provenance: Was a dependency possibly relicensed through an AI rewrite whose legality is unclear?
To mitigate risks, companies must be aware of the dangers of AI-assisted coding:
AI output typically lacks a traceable chain of origin. Developers receive snippets, patterns, or entire functions without being able to reliably assess whether these were (a) originally generated by the AI, (b) based on open-source code, or (c) in fact a (partial) reproduction of specific third-party code passages. This is critical in the OSS context because licensing obligations are tied to specific parts of the work and their distribution/integration.
Even short snippets can be license-relevant (depending on protectability/level of creativity and specific adoption). With copyleft licenses (GPL family, partly AGPL), the risk increases when AI generates output that is functionally or textually closely based on copyleft-licensed code and is subsequently checked into proprietary components.
AI tools produce not only code but also “legal” accompanying statements (e.g., “this is MIT,” “this is freely usable,” “no copyleft”). This leads to false compliance: developers rely on unsubstantiated statements rather than verifiable license information (repository, LICENSE file, header, SPDX).
Assisted programming typically works with source code, tickets, logs, architecture diagrams, or customer data in the prompt context. Depending on the tool setup (cloud backend, telemetry, training/retention), there is a risk of disclosure and loss of control over trade secrets, security-relevant information, or personal data.
AI can suggest insecure patterns (missing input validation, insecure cryptography, SSRF/SQLi), recommend outdated dependencies, or “sneak in” new third-party components without being noticed. Furthermore, agents can make changes automatically, thereby bypassing traditional control points (code reviews, dependency governance) if processes are not adapted.
If AI output contains copyright-relevant content and is used without a proper license, a classic copyright risk arises: use without granted rights. In the event of a dispute, injunctions, removal, disclosure, damages, and recall/stop scenarios are on the table—depending on product distribution and the degree of integration.
The copyright issue is exacerbated by a copyright vacuum: On March 2, 2026, the U.S. Supreme Court denied certiorari in Thaler v. Perlmutter - works generated purely by AI cannot (and will not) claim copyright protection under U.S. law.
This currently creates a paradox: Anyone who licenses AI-generated code under MIT or BSD may not have any copyright to license in the first place. At the same time, copyleft cannot apply to non-copyrightable output.
Under EU law, there has been no comparable clarification from the highest court to date; according to prevailing opinion, copyright protection here also requires a personal intellectual creation, which is likely to be absent in purely machine-generated code
If AI-generated code is effectively incorporated as an OSS-dependent derivative work, OSS licensing obligations may be triggered: attribution, inclusion of license text, provision of source code, copyleft sublicensing, NOTICE files, build scripts, etc. In the case of copyleft, this can - as is known from OSS compliance practice (LINK) - lead to a disclosure obligation.
Trade secrets require appropriate confidentiality measures. A prompting process that transfers source code or internal architecture to external systems without control can weaken the legal position: not only through de facto disclosure, but also by undermining the argument that appropriate protective measures were in place.
If personal data (including in logs or test data) or customer-related information is entered into AI tools, risks arise under data protection law and contractual confidentiality regimes (NDAs, AVVs/DPAs, industry requirements). Key issues typically include: role allocation, purpose limitation, transfer, deletion/retention rules, and subprocessors.
In B2B projects, OSS compliance and security are increasingly becoming supply chain requirements (SBOM, third-party notices, audit rights, representations). AI-induced license violations or security flaws thus have repercussions not only internally but also as warranty/damages issues vis-à-vis customers, as contractual penalties, or as deal-breakers in audits/M&A.
(a) Code scanning for license and similarity indicators: Use of SCA/OSS scanners (including snippet/similarity detection, where available) to check AI-generated insertions for potential license origins.
(b) SBOM plus “AI provenance”: Supplementing traditional SBOMs with internal metadata indicating whether and where AI assistance was used (repository/module level) to enable audits and incident response. Additionally, SBOM checks should also cover upstream relicensing: In the chardet case, the default installation via PyPI was automatically updated to the MIT-licensed version. Companies that automatically update dependencies could thus have unknowingly integrated code with an unclear license status into their products.
(c) Policy-driven dependency inclusion: AI must not “silently” introduce new dependencies; every new library goes through the same approval process as manual proposals.
(a) Clear rules on when AI output should be treated as third-party code: Practical standard: Every non-trivial AI snippet is treated as external third-party content (review, scan, attribution check).
(b) Define copyleft risk paths: Specify technical coupling and distribution scenarios (linking, SaaS/AGPL, container distribution) and “no-go” zones (e.g., no AI output in core IP without scanning/review).
(c) Vendor contracts and tool toggles: Ensure that (a) no training use/retention occurs without approval, (b) subprocessors/transfers are controlled, (c) audit and deletion commitments are documented, (d) IP regulations regarding output are clear (no unexpected transfers of rights to providers).
(a) AI-specific PR checks: "AI assisted?" field in the PR template + automated checks (SCA, secrets scan, license scan).
(b) Two-step review for sensitive modules: core algorithms, security-relevant code, cryptography, license exposure paths.
(c) Incident Response Playbook: Procedure for suspected unlicensed use (quarantine, replacement, attribution, retroactive licensing, disclosure, customer communication).
(a) Control data flow: Prefer enterprise/self-hosted options or configurations with training disabled and defined retention.
(b) Tenant separation and access: SSO, role-based permissions, logging, repository scopes; no private accounts for corporate code.
(c) Red-flag prohibitions: No prompting with customer source code, production logs, credentials, security findings, or unpublished patent applications/inventions.
An OSS policy governs third-party software. An AI Use Policy governs third-party output. In terms of content, it should at least define:
Short rules that work in practice:
Assisted programming shifts the compliance question from “Which OSS is in the product?” to “What third-party content has made its way into the code—and can I prove it?”. Companies that treat AI coding tools as another supply chain source (policy, technical controls, SBOM/provenance, contract and data protection setup) not only reduce liability risk but also increase audit and transaction security. The structures that have proven themselves in OSS compliance serve as the obvious blueprint here.
Developments since March 2026 confirm the urgency of such measures in a new way: The question continues to shift—from “What third-party content has made its way into the code?” to “Can we prove that our entire supply chain - including upstream dependencies - is compliant?” The chardet case shows that AI rewrites can erode the economic foundation of copyleft licenses and that the legal classification of AI-generated code - both at the EU level (GEMA v. OpenAI, TDM exception) and in the U.S. (Thaler v. Perlmutter, copyright protection) - will take years to resolve.
Those who establish robust OSS and AI governance now are safeguarding themselves not only against current regulatory requirements but also against those that remain unknown.