Skip to content
CommonContent

e-Magazine

AI Trained on Our Work Without Permission — What the Creative Commons Movement Must Do Next

By Anna Vetrova · May 28, 2026

AI Trained on Our Work Without Permission — What the Creative Commons Movement Must Do Next

The question sounds abstract until you have faced it personally. A musician who released their recordings under a Creative Commons Attribution licence ten years ago — inviting anyone to use, remix, and share the work freely — discovers that their voice, their melodic phrasing, their distinctive production choices have been absorbed into a generative AI music system. The system now produces outputs that sound uncannily like them. When they enquire about compensation, attribution, or opt-out, they are told the training data corpus is proprietary and the question of what was in it cannot be confirmed or denied.

This is not a hypothetical. It is the situation that creators who committed to open sharing in good faith now find themselves in, and it is the defining tension in the Creative Commons movement in 2026.

Creative Commons was founded on a specific conviction — that sharing creative work under clear, machine-readable licences would grow a commons of culture that anyone could build on, remix, and extend. The licences were designed for a world of human creative reuse. When a musician released under CC BY, they imagined other musicians, filmmakers, educators, and journalists finding their work and using it with attribution. The licence language covers machine processing because all licence language does, in the sense that it says what any person or entity may do with the work. But the drafters of the CC licences in 2001 and 2002 were imagining human creative reuse, not the ingestion of billions of works into statistical models that would then compete with the creators who made those works possible.

What CC Licences Actually Say — and What They Do Not

To have this conversation with any precision, you need to understand what current CC licences permit and where the gaps are.

The Creative Commons family of licences is built on copyright. They are legal tools that grant permissions beyond what copyright law's defaults allow — but they operate within the copyright framework. If a work is not protected by copyright, CC licences have nothing to attach to. If a work is protected by copyright, the licence grants specific permissions to anyone who agrees to the licence terms.

The Attribution licence — CC BY — permits copying, distribution, modification, and adaptation for any purpose, including commercial purposes, as long as the original creator is attributed. The ShareAlike variant adds the requirement that any adaptations be shared under the same or compatible terms. The NonCommercial variants restrict use to non-commercial purposes. The NoDerivatives variants prohibit the creation of adaptations.

The critical question for AI training is whether ingesting a work into a training corpus — processing it through a neural network, allowing its statistical patterns to influence the model's outputs — constitutes "use" or "adaptation" under copyright law and, by extension, under CC licence terms. This is a question that different legal systems are answering differently, and that has not been definitively resolved in any major jurisdiction.

In the United States, several pending cases have raised the question of whether AI training on copyrighted material constitutes fair use. The legal community is divided, and the courts have not yet issued the definitive rulings that would settle the question. In the European Union, the text and data mining exception introduced by the Copyright in the Digital Single Market Directive allows AI training on works obtained from lawful sources, subject to an opt-out mechanism that rightholders can exercise. The practical effectiveness of that opt-out has been questioned — it places the burden of action on individual creators, requires technical implementation that most creators cannot independently deploy, and does not apply to training data that was scraped before the opt-out was established.

For works released under CC licences, the situation is complicated by the fact that the licences were designed to expand permissions, not to restrict them. A CC BY licence grants permissions beyond what copyright's defaults allow. It does not address AI training specifically because AI training as currently practised did not exist when the licences were designed and has not been addressed through licence revision to date.

The Signals Framework and What It Means

Creative Commons has been developing what it calls CC Signals — a mechanism that would allow creators to express their preferences about AI training beyond what the existing licences specify. The idea is that a creator releasing work under CC BY might want to indicate "use for training permitted" or "training use not permitted" or "training permitted with conditions including compensation," separately from the permissions granted by the CC licence itself.

The Signals framework is not a new licence — it is a layer of preference expression that sits above the existing licence structure. It is designed to be machine-readable, so that AI developers building training data pipelines can query the signal and act on it. It is designed to be human-readable, so that creators can express preferences without requiring legal expertise.

The project is significant because it attempts to solve the coordination problem that currently produces bad outcomes for everyone. AI developers genuinely do not always know the preferences of the creators whose work they are ingesting — the training corpus is assembled at scale and individual creator preferences are not systematically captured. Creators do not have a standard way to express those preferences in a form that AI systems can read. The Signals framework attempts to create that standard.

Whether it will work depends on adoption. A machine-readable signal is only useful if AI developers building training pipelines implement the infrastructure to read and act on it. Creative Commons has been working with major AI companies and with research institutions to make that implementation possible. The outcome of that work is not yet determined, but the direction is right — if the problem is a coordination failure, a coordination mechanism is the correct response.

What the Training Data Debate Reveals About Open Culture

There is a deeper problem beneath the licensing and technical questions that the training data debate exposes. The Creative Commons movement has historically understood open sharing as something that flows in a direction — creators share, users benefit, culture grows. The expectation was that the beneficiaries of open culture would be individuals, communities, educators, and creative practitioners who would use openly shared work to make new things.

The AI training data situation has introduced a new category of beneficiary — technology companies that use open culture at scale to build commercial products that compete with the creators who made that openness possible. The economics are asymmetric in a way that challenges the foundational assumptions of the CC project. A musician who releases work under CC BY is, in a genuine sense, contributing to the public good. A generative AI music company that trains on millions of such works and deploys a commercial service is benefiting from that public good without contributing to it — and in doing so, may reduce the market for the music that the CC-licensed work helped them model.

This asymmetry does not mean that open sharing was wrong, or that CC licences were a mistake. It means that the environment in which open sharing happens has changed in ways that require the movement to adapt. Creative Commons acknowledged this directly in its January 2026 statement about the year ahead: "Advances in AI and shifts in the technological environment have unsettled long-standing motivations to share openly." That is a careful and honest formulation. The motivation to share — to contribute to a commons that benefits everyone — is weakened when it becomes unclear that the commons actually benefits everyone equitably.

Three Things the Movement Should Do in 2026

The creative commons community cannot solve the AI training data problem alone — it requires legal reform, technical standards, and commercial decisions by AI companies. But there are three things the movement can do that would meaningfully advance the cause.

First, update the licence framework to address AI training explicitly. The CC licences have been revised four times since 2001. A fifth revision that specifically addresses machine learning training — clarifying what the existing licences permit, creating a training-specific licence option, and establishing clear terms for the compensation arrangements that some creators want — would provide the legal clarity that the training data debate currently lacks. This is complex work, but it is the kind of work that Creative Commons exists to do.

Second, invest in Signals infrastructure and push for adoption. The Signals framework is the right approach to the coordination problem. Making it work requires technical infrastructure that most creators cannot build themselves and adoption by AI developers that requires sustained engagement. The movement should treat Signals infrastructure as core mission work rather than as an experimental project.

Third, build the case for reciprocal contribution to the commons. AI companies benefit from the open culture that CC has helped build. The movement should develop clear, evidence-based arguments for why those companies should contribute back — through financial support for CC infrastructure, through commitments to use Signals and respect creator preferences, and through participation in the governance of the standards that shape how AI training happens. This is an advocacy project, and it is one that the CC community is uniquely positioned to lead.

The commons has been built by millions of creators who chose to share. It deserves to be defended by an equally serious commitment to ensuring that sharing remains meaningful — that the people who give to the commons receive something from it in return, and that the infrastructure of open culture is not quietly privatised by the systems built upon it.