We analyzed tens of court cases and licensing deals to answer the key questions about copyright and generative AI. This is not legal advice. Copyright law varies by jurisdiction and is evolving fast.
The Three Big Questions
- Can copyright-protected data be used as training data? In the US, training on copyrighted works is likely fair use IF you obtain the copies legally. Downloading from pirate sites is not.
- Are AI-generated works eligible for copyright protection? In most countries, substantial human involvement is required for eligibility.
- Who is the owner of the generative AI copyright? Depends on who is designated to be the creator of the work. However, so far, no copyrights have been awarded to a machine or software.
1. Can copyright-protected data be used as training data?
In most jurisdictions, the legality of using copyrighted works to train AI models has been actively litigated and courts are beginning to draw lines. The picture that has emerged since mid-2025 is more nuanced than either side claimed: legal sourcing matters more than the act of training itself.
USA
Two landmark rulings in June 2025 gave AI companies their first major courtroom wins. In Bartz v. Anthropic, Judge William Alsup of the Northern District of California ruled that using books to train Claude was fair use when those books were legally acquired, comparing it to a human reading widely to learn to write. A parallel ruling in Kadrey v. Meta reached the same conclusion for Meta’s training practices.1
However, both cases drew a sharp distinction: legal acquisition is the threshold. Judge Alsup ruled separately that Anthropic’s downloading of pirated books from shadow libraries Library Genesis (LibGen) and Pirate Library Mirror (PiLiMi) was not protected by fair use. That finding drove Anthropic to settle the class action for $1.5 billion in August 2025, the largest copyright recovery in U.S. history, covering roughly 500,000 works at approximately $3,000 per title. 2 Anthropic also agreed to destroy the original pirated files.
The settlement resolved past claims only. It does not license Anthropic’s future training or cover outputs from its models. Claims filed after August 25, 2025, are explicitly excluded3
The New York Times litigation against OpenAI and Microsoft, filed in December 2023, is advancing through discovery. On January 5, 2026, Judge Sidney Stein ordered OpenAI to produce its entire 20-million-log sample of anonymized ChatGPT conversations, not only those logs that specifically referenced plaintiffs’ works, as OpenAI had proposed. The ruling was a significant discovery victory for media plaintiffs and could complicate OpenAI’s fair use defense if the logs show ChatGPT routinely generating content that substitutes for paywalled journalism.4 As of March 2026, OpenAI is seeking to depose the Times’ expert consultant who created exhibits demonstrating alleged reproduction of its articles.5
Publishers Hachette Book Group and Cengage Group moved in January 2026 to join a proposed class action against Google over alleged misuse of copyrighted material for AI training, signaling that institutional publishers, not just individual authors, are increasingly entering the litigation as plaintiffs.6
More than 50 copyright cases against AI companies are currently pending in U.S. federal courts. No further fair use rulings are expected before summer 2026.7
France
France’s competition authority (Autorité de la concurrence) fined Google €250 million for using news articles without permission in training Gemini8 . This was a regulatory enforcement action, not a copyright ruling, but it shows European regulators are willing to act on unauthorized use of journalistic content in AI systems.
United Kingdom
The UK High Court issued the first UK judgment directly addressing copyright infringement in the development of generative AI, in Getty Images v. Stability AI. The court rejected Getty’s secondary copyright infringement claim, finding that Stable Diffusion’s model weights did not constitute ‘infringing copies’ under UK law. Getty did win a narrow trademark infringement finding on watermark reproduction from early model version but was ordered to pay 69.4% of Stability’s costs, making the victory financially pyrrhic.9
European Union
The EU AI Act is the most significant new regulatory development for AI and copyright globally. Under Article 53, all providers of general-purpose AI (GPAI) models, including foundation models like GPT, Claude, and Gemini, must publish a structured public summary of their training data and implement a policy complying with EU copyright law, including respecting opt-outs under the EU Copyright Directive’s text and data mining exception.
The European Commission published its mandatory template for training content disclosures on July 24, 2025.10
Japan
Japan’s approach remains the most permissive among major economies. Copyrighted works can generally be used for AI training provided the material itself is not from infringing sources, and the use does not unreasonably harm the copyright holder’s interests.11
Fair use vs copyright infringement
Intellectual property law protects the rights of creators and owners of creative works, including writings, music, software, and designs. Copyright infringement carries serious legal consequences, including imprisonment in some jurisdictions. Claiming ignorance of IP law provides no defense against liability.
Fair use (in the US) and equivalent doctrines in other jurisdictions allow limited use of copyrighted material without permission for purposes such as criticism, commentary, news reporting, teaching, or research. Courts apply a four-factor test to determine whether a use qualifies.
Based on rulings through early 2026, the clearest picture of where fair use applies in AI training is:
- Likely fair use: Training on legally obtained works where the model is used for research, deployed in a constrained non-substitutive task, or generates outputs that do not compete with the source material.
- Likely not fair use: Training on pirated works; using vast amounts of copyrighted material to generate commercial output that competes with and substitutes for that material in existing markets.
- Outputs are a separate question: Training a model on lawfully obtained works does not automatically make the model’s outputs lawful. If a model can reproduce substantial portions of protected works, the output and downstream users may face separate infringement claims.
The US Copyright Office’s Part 3 report found that model weights themselves may infringe the reproduction right if they have memorized substantial protectable expression from training data. This opens potential liability not just at training time but when models are distributed, fine-tuned, or deployed by third parties.
2. Are AI-Generated Works Eligible for Copyright Protection?
Whether AI-generated works can be protected by copyright depends on jurisdiction, but the common thread across every country that has addressed the question is the same: human authorship is required.
The US Copyright Office’s January 2025 Part 2 report confirmed that AI outputs qualify for copyright protection only where humans provide sufficient creative input. The threshold is not minimal; writing a text prompt does not qualify.
AI-assisted artwork received copyright protection
In September 2022, the US Copyright Office made history by issuing a groundbreaking registration for the comic book Zarya of the Dawn, created using the text-to-image AI tool Midjourney.12 The author clarified that the artwork was AI-assisted, not solely AI-generated. She structured the story, designed the page layouts, and made artistic decisions to arrange the elements alongside the AI-generated images.
Figure 1. Drawings from the last page of AI-generated comic book Zarya of the Dawn. (Source: Zarya of the Dawn)
The award-winning Midjourney image was denied copyright protection.
Another controversial example of generative art is an AI-generated print that won a competition at the Colorado State Fair.13 The creator expressed that he spent numerous weeks curating the perfect prompts and manually identifying the finished product. The award-winning AI-generated art is shown in Figure 2 below.
Figure 2. The award-winning AI-generated print Theatre d’Opera Spatial. (Source: The Verge)
This image was denied copyright protection.14 . Ultimately, whether AI-generated works are eligible for copyright protection raises questions about ownership rights and who would own the copyright in such cases. Countries requiring a human agency for authorship generally deny copyright protection of AI-generated works.
3. Who Owns the Copyright in AI-Generated Work?
In most countries, copyright law assigns ownership to the creator of a work. When AI produces the work, the question of who the creator is and, therefore, who owns it has no settled universal answer.
The programmer approach: The UK, India, Ireland, New Zealand, and Hong Kong allow programmers to claim authorship of computer-generated works. The ‘person by whom the arrangements necessary for the creation of the work are undertaken’ owns the copyright.15 The problem with this approach is that it ignores the contribution of the people whose data trained the model.
Problem: What about the training data creators? If an AI trained on Rembrandt paintings generates new artwork, does the programmer get full credit while Rembrandt’s contribution is ignored?
Figure 3. “The Next Rembrandt” is a computer-generated 3D painting that was inspired by the real paintings of 17th-century Dutch painter Rembrandt. (Source: The Guardian)
The user approach: If a person provides substantial creative direction beyond simple prompts, they may qualify as an author. Courts are still defining what ‘substantial’ means in this context.
The AI-as-author approach: Stephen Thaler sued the US Copyright Office in 2022, arguing his AI system should be recognized as the author of its own works. Courts rejected this at every level. As of March 2026, the Supreme Court of the United States has denied hearing challenges to the Copyright Office’s position that no jurisdiction recognizes AI as a legal person capable of holding copyright.16
How AI Companies Are Actually Handling This
Licensing Deals
Major AI companies have pursued content licensing aggressively, either because they believe it is legally necessary or to reduce litigation risk. Key deals include:
OpenAI: Financial Times (April 2024), Vox Media (May 2024), The Atlantic (May 2024), Reddit ($70M/year), and multiple other publishers.17
Google: Reddit licensing deal (February 2024), multiple news organizations.
Shutterstock: Reported $104 million in AI licensing revenue in 2024.18
Music: Universal Music Group settled its lawsuit against Udio in October 2025, establishing a licensing deal with an opt-in structure for artists. Warner Music Group settled a parallel Suno lawsuit in November 2025. A new subscription service built on fully licensed music is expected to launch in 2026.19
The volume of licensing activity is itself evidence that the industry does not consider fair use a guaranteed defense, particularly for commercial-scale use of creative content.
Generative AI Copyright Best Practices
For Content Creators
- Register your copyrights. In the US, only registered works are eligible for statutory damages, which is the basis for large settlement payouts like Bartz.
- Check the Anthropic settlement Works List at AnthropicCopyrightSettlement.com. If your work appears and you have not yet filed a claim, the deadline is March 30, 2026.
- The opt-out window for the Anthropic settlement closed on February 9, 2026. If you did not opt out before that date, you are in the class. The final court approval hearing is April 23, 2026.20
- Use opt-out mechanisms offered by AI companies for future training. Many major providers now allow rights holders to request exclusion of their works from training datasets.
- Document your creative process when producing AI-assisted work. The more thoroughly you can show that a human made the meaningful decisions, the stronger any copyright claim will be.
For Businesses Deploying AI
- Assess your risk tolerance by use case. The legal risk profile of generating marketing copy with an LLM differs significantly from using AI to reproduce or summarize published journalism.
- Document human creative involvement in any AI-assisted outputs you intend to claim copyright in.
- Review what legal protection your AI vendor actually provides. Indemnification clauses vary widely and often exclude claims arising from your prompts.
- Track which AI tools are used across your organization, for what purposes, and what training data they are built on. This is increasingly required under the EU AI Act for companies operating in Europe.
- The EU AI Act’s enforcement of high-risk AI begins on August 2, 2026. If you use or deploy GPAI models in the EU, your vendors must publish training data summaries and copyright compliance policies.21
For AI Companies
- License proactively. The Anthropic settlement demonstrates that licensing deals, however expensive, are cheaper than the exposure to class-action litigation. The $1.5 billion settlement covered only books; the music publishers’ new $3 billion suit shows the liability can stack across content types.
- Never use pirate sources. Bartz v. Anthropic and the UMG/Concord suit both arose directly from shadow library downloads. The Bartz ruling confirmed that legal sourcing is the decisive threshold for fair use in the US.
- Document data provenance. Know exactly where every piece of training data came from and be able to demonstrate it in litigation and regulatory review.
- Plan for jurisdictional variation. The EU AI Act, Japan’s permissive framework, the UK’s evolving position post-Getty, and the US fair use doctrine are all materially different. A training practice legal in one jurisdiction may not be legal in another.
FAQ
Copyright: A type of intellectual property (IP) that protects tangible forms of artistic, literary, or intellectual works, such as paintings, books, and software. Copyright lasts for decades, often up to 70 years after the author’s death.
Patents: IP protections for inventions and new processes, differing from copyright by covering functional aspects rather than creative expressions.
Fair use: A legal doctrine allowing limited use of copyrighted material without permission under certain conditions, such as for criticism, comment, news reporting, teaching, or research.
Generative AI: Artificial intelligence systems that create new text, images, videos, and other media, raising debates on copyrightability and ownership of the generated outputs.
Inputs in AI training: The data used to train generative AI models, which can include copyrighted material. Issues arise about whether using such data without permission constitutes copyright infringement.
Outputs in AI: The new works produced by generative AI, such as text or images, and the debate over their copyrightability, given that human authorship is typically required for copyright protection.
Transformative use: A type of fair use where the new work adds something new with a different purpose or character, not substituting for the original work.
Creative control: The level of influence a human has over the creation of a work, which impacts whether AI-generated outputs are deemed copyrightable.
Copyright registration: The process of officially registering a work with the U.S. Copyright Office, which currently requires human authorship for protection.
For more on generative AI
- Generative AI in Healthcare: Benefits, Challenges, Potentials
- Generative AI in Fashion: 5 Use Cases with Case Studies
- Top 5 Use Cases of Generative AI in Education
- Top 4 Use Cases of Generative AI in Banking
If you questions about generative AI or need help in finding vendors, reach out:
Find the Right VendorsReference Links
Cem's work has been cited by leading global publications including Business Insider, Forbes, Washington Post, global firms like Deloitte, HPE and NGOs like World Economic Forum and supranational organizations like European Commission. You can see more reputable companies and resources that referenced AIMultiple.
Throughout his career, Cem served as a tech consultant, tech buyer and tech entrepreneur. He advised enterprises on their technology decisions at McKinsey & Company and Altman Solon for more than a decade. He also published a McKinsey report on digitalization.
He led technology strategy and procurement of a telco while reporting to the CEO. He has also led commercial growth of deep tech company Hypatos that reached a 7 digit annual recurring revenue and a 9 digit valuation from 0 within 2 years. Cem's work in Hypatos was covered by leading technology publications like TechCrunch and Business Insider.
Cem regularly speaks at international technology conferences. He graduated from Bogazici University as a computer engineer and holds an MBA from Columbia Business School.
Be the first to comment
Your email address will not be published. All fields are required.