Auteur

Nils von Reith

Collaborateur senior

Francfort

Auteur

Nils von Reith

Collaborateur senior

Francfort

10 juin 2026

AI and Assisted Programming in Open Source Current Cases, Legal Risks, Compliance by Design

In-depth analysis

The use of AI-powered coding tools (code completion, chat-based assistants, agents, and other tools) has rapidly evolved from an “experiment” to a productivity-enhancing standard in software development. In tech stacks heavily reliant on open source, this shifts the risk profile: It is not the OSS component itself that is new, but the additional source of “AI-generated code” as potentially non-transparent third-party content - with implications for copyright, licensing, trade secret protection, and security.

As with traditional OSS usage, the following applies: There is no legal vacuum; usage and compliance requirements must be managed both organizationally and technically. Robust OSS governance is the foundation for this.

Current real-world cases demonstrate that this is necessary - in the “Chardet case,” which has become almost famous in a short time:

1. From Practice: Copyleft Circumvention via AI Rewrites

March 2026: The long-time maintainer of an LGPL-licensed Python library had the entire codebase regenerated using Claude Code in about five days and published the result under the permissive MIT license. According to JPlag analysis, the structural similarity to the previous version was less than 1.3%. The original author disagrees: This is not a “clean room” implementation, as the maintainer had years of access to the LGPL code and the LLM demonstrably accessed metadata from the LGPL version.

The Free Software Foundation clarifies: “There is nothing ‘clean’ about a Large Language Model which has ingested the code it is being asked to reimplement”, The Software Freedom Conservancy announced a formal analysis on March 27, 2026.

In academia, this phenomenon is already referred to as “copyleft laundering” - the systematic circumvention of copyleft obligations through AI-assisted reimplementation.

This episode illustrates impressively: Companies that use OSS components must now verify not only the provenance of their own AI output but also the upstream provenance: Was a dependency possibly relicensed through an AI rewrite whose legality is unclear?

2. Risks of AI-Assisted Coding in the OSS Context

To mitigate risks, companies must be aware of the dangers of AI-assisted coding:

2.1 Non-transparent code provenance (“Unknown Origin Risk”)

AI output typically lacks a traceable chain of origin. Developers receive snippets, patterns, or entire functions without being able to reliably assess whether these were (a) originally generated by the AI, (b) based on open-source code, or (c) in fact a (partial) reproduction of specific third-party code passages. This is critical in the OSS context because licensing obligations are tied to specific parts of the work and their distribution/integration.

2.2 License and Copyleft “Contamination” via AI Snippets

Even short snippets can be license-relevant (depending on protectability/level of creativity and specific adoption). With copyleft licenses (GPL family, partly AGPL), the risk increases when AI generates output that is functionally or textually closely based on copyleft-licensed code and is subsequently checked into proprietary components.

2.3 Hallucinations and False Compliance

AI tools produce not only code but also “legal” accompanying statements (e.g., “this is MIT,” “this is freely usable,” “no copyleft”). This leads to false compliance: developers rely on unsubstantiated statements rather than verifiable license information (repository, LICENSE file, header, SPDX).

2.4 Leakage of Secrets and Confidentiality via Prompts/Context

Assisted programming typically works with source code, tickets, logs, architecture diagrams, or customer data in the prompt context. Depending on the tool setup (cloud backend, telemetry, training/retention), there is a risk of disclosure and loss of control over trade secrets, security-relevant information, or personal data.

2.5 Security and supply chain risks

AI can suggest insecure patterns (missing input validation, insecure cryptography, SSRF/SQLi), recommend outdated dependencies, or “sneak in” new third-party components without being noticed. Furthermore, agents can make changes automatically, thereby bypassing traditional control points (code reviews, dependency governance) if processes are not adapted.

3. Legal issues arising from this

3.1 Copyright: unlicensed use and modification

If AI output contains copyright-relevant content and is used without a proper license, a classic copyright risk arises: use without granted rights. In the event of a dispute, injunctions, removal, disclosure, damages, and recall/stop scenarios are on the table—depending on product distribution and the degree of integration.

The copyright issue is exacerbated by a copyright vacuum: On March 2, 2026, the U.S. Supreme Court denied certiorari in Thaler v. Perlmutter - works generated purely by AI cannot (and will not) claim copyright protection under U.S. law.

This currently creates a paradox: Anyone who licenses AI-generated code under MIT or BSD may not have any copyright to license in the first place. At the same time, copyleft cannot apply to non-copyrightable output.

Under EU law, there has been no comparable clarification from the highest court to date; according to prevailing opinion, copyright protection here also requires a personal intellectual creation, which is likely to be absent in purely machine-generated code

3.2 Open-Source Licensing Law: Cascading Obligations and Loss of Usage Rights

If AI-generated code is effectively incorporated as an OSS-dependent derivative work, OSS licensing obligations may be triggered: attribution, inclusion of license text, provision of source code, copyleft sublicensing, NOTICE files, build scripts, etc. In the case of copyleft, this can - as is known from OSS compliance practice (LINK) - lead to a disclosure obligation.

3.3 Confidentiality Protection: Loss of the “Reasonableness” of Protective Measures

Trade secrets require appropriate confidentiality measures. A prompting process that transfers source code or internal architecture to external systems without control can weaken the legal position: not only through de facto disclosure, but also by undermining the argument that appropriate protective measures were in place.

3.4 Data protection and confidentiality: personal data and customer secrets

If personal data (including in logs or test data) or customer-related information is entered into AI tools, risks arise under data protection law and contractual confidentiality regimes (NDAs, AVVs/DPAs, industry requirements). Key issues typically include: role allocation, purpose limitation, transfer, deletion/retention rules, and subprocessors.

3.5 Contractual and liability risks vis-à-vis customers and within the supply chain

In B2B projects, OSS compliance and security are increasingly becoming supply chain requirements (SBOM, third-party notices, audit rights, representations). AI-induced license violations or security flaws thus have repercussions not only internally but also as warranty/damages issues vis-à-vis customers, as contractual penalties, or as deal-breakers in audits/M&A.

4. How these problems can be solved (even after/during coding)

4.1 Technical verification instead of gut feeling

(a) Code scanning for license and similarity indicators: Use of SCA/OSS scanners (including snippet/similarity detection, where available) to check AI-generated insertions for potential license origins.
(b) SBOM plus “AI provenance”: Supplementing traditional SBOMs with internal metadata indicating whether and where AI assistance was used (repository/module level) to enable audits and incident response. Additionally, SBOM checks should also cover upstream relicensing: In the chardet case, the default installation via PyPI was automatically updated to the MIT-licensed version. Companies that automatically update dependencies could thus have unknowingly integrated code with an unclear license status into their products.
(c) Policy-driven dependency inclusion: AI must not “silently” introduce new dependencies; every new library goes through the same approval process as manual proposals.

4.2 Operationalize legal guidelines

(a) Clear rules on when AI output should be treated as third-party code: Practical standard: Every non-trivial AI snippet is treated as external third-party content (review, scan, attribution check).
(b) Define copyleft risk paths: Specify technical coupling and distribution scenarios (linking, SaaS/AGPL, container distribution) and “no-go” zones (e.g., no AI output in core IP without scanning/review).
(c) Vendor contracts and tool toggles: Ensure that (a) no training use/retention occurs without approval, (b) subprocessors/transfers are controlled, (c) audit and deletion commitments are documented, (d) IP regulations regarding output are clear (no unexpected transfers of rights to providers).

4.3 Process: Adjust review and approval points

(a) AI-specific PR checks: "AI assisted?" field in the PR template + automated checks (SCA, secrets scan, license scan).
(b) Two-step review for sensitive modules: core algorithms, security-relevant code, cryptography, license exposure paths.
(c) Incident Response Playbook: Procedure for suspected unlicensed use (quarantine, replacement, attribution, retroactive licensing, disclosure, customer communication).

5. Ensure AI coding tools are compliant from the start (before coding)

5.1. Tool selection and operating model

(a) Control data flow: Prefer enterprise/self-hosted options or configurations with training disabled and defined retention.
(b) Tenant separation and access: SSO, role-based permissions, logging, repository scopes; no private accounts for corporate code.
(c) Red-flag prohibitions: No prompting with customer source code, production logs, credentials, security findings, or unpublished patent applications/inventions.

5.2 Policy: “AI Use Policy” as an annex to the OSS policy

An OSS policy governs third-party software. An AI Use Policy governs third-party output. In terms of content, it should at least define:

Permitted tools and configurations (including retention/training off),
permitted input categories and prohibited content (secrets, personal data, customer confidential information),
classification of AI output as “external” above a certain threshold,
mandatory checks (scan, review, attribution),
documentation requirements (e.g., commit tags or PR flags),
Escalation procedures (Legal/Compliance) in cases of suspected copyleft or provenance issues.
Regulations regarding upstream contributions. The AI Use Policy must define the conditions under which AI-assisted code may be contributed to upstream projects and provide for a review process in accordance with the target project’s contributor policy. Numerous FOSS projects have introduced explicit anti-LLM policies since the chardet incident (including Zig: “strict no-LLM policy”; FreeBSD: rejection of AI-generated code; GNU Guix: “Standing up for human crafting” pledge, May 2026).

5.3 Engineering Controls: Guardrails in CI/CD

Pre-Commit/CI: Secrets scan, license scan, dependency allowlist, notice generator, SBOM generation.
Repository hygiene: SPDX header standards, third-party NOTICE, automated attribution.
Limit agents: No autonomous merge rights; agents work via PRs with review and checks.

5.4 Training with concrete developer rules

Short rules that work in practice:

AI does not replace license review.
No copy-pasting of large blocks without checking the source.
New dependencies only via the approval process.
If you suspect Copyleft/AGPL: stop, don't "refactor it away," but clarify the situation.
Prompt contents are potentially public information: no secrets, no personal data.

6. Conclusion: AI coding is manageable - but only as part of OSS governance

Assisted programming shifts the compliance question from “Which OSS is in the product?” to “What third-party content has made its way into the code—and can I prove it?”. Companies that treat AI coding tools as another supply chain source (policy, technical controls, SBOM/provenance, contract and data protection setup) not only reduce liability risk but also increase audit and transaction security. The structures that have proven themselves in OSS compliance serve as the obvious blueprint here.

Developments since March 2026 confirm the urgency of such measures in a new way: The question continues to shift—from “What third-party content has made its way into the code?” to “Can we prove that our entire supply chain - including upstream dependencies - is compliant?” The chardet case shows that AI rewrites can erode the economic foundation of copyleft licenses and that the legal classification of AI-generated code - both at the EU level (GEMA v. OpenAI, TDM exception) and in the U.S. (Thaler v. Perlmutter, copyright protection) - will take years to resolve.

Those who establish robust OSS and AI governance now are safeguarding themselves not only against current regulatory requirements but also against those that remain unknown.

Secteurs Technologie, Médias et Communications (TMC)