← Feed
EmergingIP / Copyright

US Court Rules AI Training on Copyrighted Books is Fair Use

Why It Matters

This landmark ruling establishes a legal precedent that AI training is transformative rather than derivative, potentially shielding AI developers from multi-billion dollar copyright liabilities while forcing stricter data sourcing audits.

Key Points

  • The court ruled that training LLMs on copyrighted works is transformative and constitutes fair use.
  • Converting physical books into digital training data was specifically protected by the ruling.
  • The court explicitly excluded pirated materials from fair use protection, leaving Anthropic and others liable for 'shadow library' data.
  • Judge Alsup compared AI training to schoolchildren learning to write, rejecting claims that training itself is copyright infringement.
  • The ruling may lead to a shift in AI industry practices toward more rigorous data provenance and cleaning to avoid pirated content.

A United States District Court has ruled in favor of AI company Anthropic, determining that training large language models (LLMs) on copyrighted books constitutes 'fair use' under U.S. law. Judge Alsup likened the process to students learning from books to improve their own writing, rather than verbatim regurgitation. However, the ruling included a significant caveat: while training on legitimately acquired data is protected, the use of pirated materials remains a potential basis for liability. The court also held that converting physical books to digital formats for training purposes is permissible. This decision provides much-needed clarity for the AI industry regarding data acquisition, though it leaves open the possibility of future litigation regarding the 'pirated' status of specific datasets used in foundation model development.

A judge just gave AI companies a huge win by saying it's okay to 'read' copyrighted books to train AI, just like a human student would. The court decided that since the AI creates something new and doesn't just copy-paste the books, it's 'fair use.' But there's a catch: you can't use stolen or pirated books from shady websites. It's like saying you can study a library book to get smarter, but you can't study a book you stole from the back of a truck. This is great for AI progress but still leaves authors worried about their jobs and where their data is coming from.

Sides

Critics

Departed Safety ResearchersC

Claim that safety protocols and the original mission were sidelined in favor of rapid product shipping.

Former Research StaffC

Expresses concern that the company has abandoned its safety-first mission in favor of rapid product releases.

Former Safety ResearchersC

Believe the company has abandoned its safety-first culture in favor of rapid product releases.

Departed ResearchersC

Allege that the company has abandoned its safety-first ethos in favor of commercial dominance.

Safety Researchers & Former EmployeesC

Alleges that the company has deprioritized safety and transparency in the rush to beat competitors to market.

Former OpenAI ResearchersC

Express concern that the company's pivot toward consumer products compromises its original safety and transparency commitments.

Ed Newton-RexC

Contends the ruling proves tech companies are losing the long-term legal battle over fair use despite tactical wins.

Authors (Plaintiffs)C

Contend that training AI on their books without permission or compensation is copyright infringement.

Defenders

OpenAI LeadershipC

Argues that commercial success is necessary to fund the massive compute required for AGI development.

MetaC

Argued that training generative AI models on public datasets constitutes fair use under copyright law.

Judge William AlsupC

Previously ruled in the Anthropic case that AI training is analogous to human learning and qualifies as fair use.

AnthropicS

Argued that training models on books is a transformative use of data similar to human learning.

Andrew NgS

Supports the ruling as a win for AI progress and reduced regulatory ambiguity, while acknowledging concerns about writer livelihoods.

Neutral

MicrosoftC

Maintains a strategic partnership focused on integrating OpenAI technology while hedging with other model providers.

Institutional InvestorsC

Focused on the company's valuation and market dominance while wary of the impact of executive turnover.

Industry AnalystsC

Observe that OpenAI has won the first phase of the AI war but faces a 'winner's curse' regarding sustainability.

Tech AnalystsC

Observes that OpenAI has won the first 'AI war' for mindshare but faces the 'innovator's dilemma' regarding its own future stability.

Judge Vince ChhabriaC

Ruled for Meta on technicalities but issued a legal opinion that unlicensed AI training is generally copyright infringement.

Judge AlsupC

Authored the ruling that training is fair use but pirated data usage is not protected.

Noise Level

Uproar65
Decay: 99%
Reach
55
Engagement
0
Star Power
100
Duration
100
Cross-Platform
90
Polarity
75
Industry Impact
90

Forecast

AI Analysis — Possible Scenarios

The decision is highly likely to be appealed to a higher circuit court by the plaintiffs. In the short term, AI companies will likely begin aggressive audits of their training sets to remove any data linked to known pirate repositories to comply with the 'legitimate acquisition' standard.

Based on current signals. Events may develop differently.

Key Sources

@AndrewYNg

On Monday, a United States District Court ruled that training LLMs on copyrighted books constitutes fair use. A number of authors had filed suit against Anthropic for training its models on their books without permission. Just as we allow people to read books and learn from them …

@ednewtonrex

Lots more to say on this, but the US Copyright Office’s report on generative AI training is superb. It rejects the idea that all gen AI training is fair use, and it makes it clear that licensing training data is the way forward. Plus it destroys a bunch of AI booster mantras alon…

@ednewtonrex

🚨 Another AI fair use ruling today, and this one is *much* better for creators. 🚨 tl;dr: The judge said "In many circumstances it will be illegal to copy copyright-protected works to train generative AI models without permission." Authors sued Meta for training on their books; …

@MelMitchell1

On the topic of AI training on copyrighted data, many people have echoed the argument made by Andrew Ng below. But it would be interesting to think about what copyright law would be like if humans had the ability to memorize entire books and recite them when prompted to do so.

@IATSE

Statement on Three US Policy Developments Regarding Artificial Intelligence for Behind-the-Scenes Entertainment Workers: 1. Regarding the U.S. Copyright Office’s “Copyright and Artificial Intelligence, Part 3: Generative AI Training” Report: “We commend the U.S. Copyright Office’…

@NBCNews

A federal judge has sided with Anthropic in a major copyright ruling, declaring that artificial intelligence developers can train on published books without authors’ consent. https://www.nbcnews.com/tech/tech-news/federal-judge-rules-copyrighted-books-are-fair-use-ai-training-rcn…

@GaryMarcus

👏“I can only support generative AI that doesn’t exploit creators by training models — which may replace them — on their work without permission.” -- @ednewtonrex

@JTillipman

If you care about the future of AI regulation, you have 1 day left to comment on @USGSA's proposed AI procurement clause. This is not just about government AI. In its current form, the proposed clause could reshape the broader AI market. The clause reaches any company whose AI sy…

Y@mjashanks

Show HN: Budibase Agents Beta – model-agnostic AI agents for internal workflows

Show HN: Budibase Agents Beta – model-agnostic AI agents for internal workflows

OpenAI Won the Consumer Mindshare—And Paid For It With Everything Else

A thorough investigation using the most neutral sources I could find

Timeline

  1. Investigation Published

    A comprehensive report details the long-term institutional costs of OpenAI's rapid growth strategy.

  2. Investigation Report Released

    Analysis reveals the high operational and cultural price OpenAI paid for its current market position.

  3. Status Investigation

    Reports detail the trade-off between OpenAI's massive consumer footprint and its deteriorating internal research culture.

  4. Comprehensive Investigation Published

    New reports detail the 'everything else' OpenAI sacrificed to maintain its market lead.

  5. Institutional Decay Report

    An investigative report highlights the high cost of OpenAI's consumer success on its internal stability.

  6. Mindshare Analysis Published

    Reports detail the trade-off between market dominance and original institutional values.

  7. Mindshare Investigation Published

    Reports emerge detailing the high internal cost of OpenAI's aggressive commercial strategy.

  8. Market Dominance Report

    Investigations confirm OpenAI holds the majority of consumer mindshare despite deepening internal organizational issues.

  9. Mindshare Analysis Published

    A comprehensive report details the trade-off between OpenAI's market dominance and its internal cultural erosion.

  10. Commercial Expansion Peak

    OpenAI secures record-breaking funding rounds while simultaneously facing lawsuits and internal resignations.

  11. Andrew Ng Analyzes Impact

    AI leader Andrew Ng publishes a summary of the ruling, highlighting the importance of data-centric AI practices.

  12. Meta Ruling Issued

    Judge Chhabria grants Meta a fair use victory but writes a scathing opinion against the legality of unlicensed AI training.

  13. Anthropic Fair Use Ruling

    Judge Alsup rules in favor of Anthropic, likening AI training to human learning processes.

  14. District Court Ruling Issued

    Judge Alsup rules that training on books is fair use, but denies protection for pirated materials.

  15. Safety Team Dissolution

    The Superalignment team is disbanded following the departures of Ilya Sutskever and Jan Leike.

  16. Ilya Sutskever Departs

    The Chief Scientist and co-founder leaves, signaling the end of the original research-heavy era.

  17. Superalignment Dissolution

    The team dedicated to long-term AI risks is disbanded following the departure of Ilya Sutskever and Jan Leike.

  18. Safety Team Dissolution

    The Superalignment team is disbanded following the departure of key leaders like Ilya Sutskever.

  19. Safety Team Dissolution

    The Superalignment team is disbanded following the departure of Ilya Sutskever and Jan Leike.

  20. Superalignment Team Dissolves

    Key safety leaders Ilya Sutskever and Jan Leike resign, citing safety concerns.

  21. Superalignment Team Dissolves

    Key safety leaders Ilya Sutskever and Jan Leike resign, citing disagreements over the company's priorities.

  22. Ilya Sutskever Departs

    The co-founder and Chief Scientist leaves to start a new venture focused solely on safe superintelligence.

  23. Superalignment Team Dissolves

    Key leaders Ilya Sutskever and Jan Leike resign, citing a breakdown in trust and resource allocation for safety.

  24. Leadership Crisis

    Sam Altman is briefly ousted and then reinstated, revealing deep rifts between the board and management.

  25. The Ousting of Sam Altman

    The board briefly removes Altman over communication issues, revealing deep internal philosophical divisions.

  26. Leadership Crisis

    The board briefly fires CEO Sam Altman, exposing deep rifts between safety-focused and growth-focused factions.

  27. Boardroom Coup Attempt

    Sam Altman is briefly ousted, revealing deep ideological splits within the organization.

  28. Leadership Crisis

    Sam Altman is briefly ousted and then reinstated, highlighting deep board divisions.

  29. Board Crisis

    Sam Altman is briefly ousted by the board over communication issues before being reinstated.

  30. Board Crisis

    The board briefly fires Sam Altman, citing a lack of transparency, revealing deep internal divisions.

  31. Board Ousts Sam Altman

    The original nonprofit board fires the CEO, citing a lack of transparency, leading to a near-total staff revolt.

  32. Board Ousts Sam Altman

    A brief internal coup occurs over concerns about Altman's transparency and the speed of development.

  33. ChatGPT Launch

    OpenAI releases ChatGPT, instantly capturing global attention and starting the AI arms race.

  34. ChatGPT Launch

    OpenAI releases ChatGPT, sparking the fastest consumer growth in history and setting the stage for market dominance.

  35. ChatGPT Launch

    OpenAI releases ChatGPT, triggering a massive surge in consumer adoption and market interest.

  36. ChatGPT Launch

    OpenAI releases ChatGPT, triggering an unprecedented surge in consumer adoption.

  37. ChatGPT Launch

    OpenAI releases ChatGPT, beginning its rapid ascent to consumer dominance.

  38. ChatGPT Launch

    The release of ChatGPT triggers a global AI boom and massive consumer adoption.

  39. ChatGPT Launches

    OpenAI releases ChatGPT, triggering an unprecedented surge in consumer adoption and market valuation.

  40. ChatGPT Launch

    OpenAI releases ChatGPT, triggering a global surge in consumer interest and investment.

  41. Capped-Profit Pivot

    OpenAI creates a for-profit subsidiary to attract venture capital and scale compute.

  42. OpenAI Founded

    Launched as a non-profit research lab with $1 billion in commitments to build safe AGI.

Get Scandal Alerts