Chief Lawyer Xu Xinming

+86-13910160652 ciplawyers@163.com

About Us Consultation

Working Progress

more >

Forging Sword for Seven Years: Won the Case of the Dispute over Invalidation of the Invention Patent of Yee Fung Handled By Lawyer Xu Xinming
Xu Xinming: Can the Trademark Right of Zhengongfu Beat Bruce Lee’s Portrait Right?
Lasted for Eight Years: Won the Case Concerning the Administrative Dispute over Invalidation of the Invention Patent of Elecon Handled by Lawyer Xu Xinming
IMECAS v. Intel Series Cases of Patent Infringement Disputes
Winning the Case Concerning Dispute over Copyright Transfer Contract of the Musical “Race for Love” Represented by Lawyer Xu Xinming

IP Express

more >

Judicial Development

more >

CASE

more >

IP Theory

more >

IP Practice&View

more >

Trade secrets: When does the statute of limitations begin to run?

02-05 2025

legal System

more >

Chinese Law Library

International Law Library

Return to List

Home > IP Express > US&UK > Copyright

Meta used copyrighted books for AI training despite its own lawyers' warnings, authors allege

Post time：12-13 2023 Source：Reuters Author：Katie Paul

tags： copyright AI Mata

font-size: +-

563

NEW YORK, Dec 12 (Reuters) - Meta Platforms' (META.O) lawyers had warned it about the legal perils of using thousands of pirated books to train its AI models, but the company did it anyway, according to a new filing in a copyright infringement lawsuit initially brought this summer.

The new filing late on Monday night consolidates two lawsuits brought against the Facebook and Instagram owner by comedian Sarah Silverman, Pulitzer Prize winner Michael Chabon and other prominent authors, who allege that Meta has used their works without permission to train its artificial-intelligence language model, Llama.

A California judge last month dismissed part of the Silverman lawsuit and indicated that he would give the authors permission to amend their claims.

Meta did not immediately respond to a request for comment on the allegations.

The new complaint, filed on Monday, includes chat logs of a Meta-affiliated researcher discussing procurement of the dataset in a Discord server, a potentially significant piece of evidence indicating that Meta was aware that its use of the books may not be protected by U.S. copyright law.

In the chat logs quoted in the complaint, researcher Tim Dettmers describes his back-and-forth with Meta's legal department over whether use of the book files as training data would be "legally ok."

"At Facebook, there are a lot of people interested in working with (T)he (P)ile, including myself, but in its current form, we are unable to use it for legal reasons," Dettmers wrote in 2021, referring to a dataset Meta has acknowledged using to train its first version of Llama, according to the complaint.

The month prior, Dettmers wrote that Meta's lawyers had told him "the data cannot be used or models cannot be published if they are trained on that data," the complaint said.

While Dettmers does not describe the lawyers' concerns, his counterparts in the chat identify "books with active copyrights" as the biggest likely source of worry. They say training on the data should "fall under fair use," a U.S. legal doctrine that protects certain unlicensed uses of copyrighted works.

Dettmers, a doctoral student at the University of Washington, told Reuters he was not immediately able to comment on the claims.

Tech companies have been facing a slew of lawsuits this year from content creators who accuse them of ripping off copyright-protected works to build generative AI models that have created a global sensation and spurred a frenzy of investment.

If successful, those cases could dampen the generative AI craze, as they could raise the cost of building the data-hungry models by compelling AI companies to compensate artists, authors and other content creators for the use of their works.

At the same time, new provisional rules in Europe regulating artificial intelligence could force companies to disclose the data they use to train their models, potentially exposing them to more legal risk.

Meta released a first version of its Llama large language model in February and published a list of datasets used for training, including "the Books3 section of ThePile." The person who assembled that dataset has said elsewhere that it contains 196,640 books, according to the complaint.

The company did not disclose training data for its latest version of the model, Llama 2, which it made available for commercial use this summer.

Llama 2 is free to use for companies with fewer than 700 million monthly active users. Its release was seen in the tech sector as a potential game-changer in the market for generative AI software, threatening to upend the dominance of players like OpenAI and Google that charge for use of their models.

Previous Next Next

Chief Lawyer Xu Xinming

Working Progress

IP Express

Judicial Development

CASE

The Administrative Dispute over Invalidation of the Invention Patent of Elecon

JUVE Patent’s top 10 patent cases in Europe 2024

Rolex trademark infringement case in Kazakhstan: Court should have considered consumer perception

IP Theory

IP Practice&View

Trade secrets: When does the statute of limitations begin to run?

legal System

Chinese Law Library

International Law Library

Meta used copyrighted books for AI training despite its own lawyers' warnings, authors allege

Relate Articles

Authors sue Anthropic for copyright infringement over AI training

Nvidia is sued by authors over AI use of copyrighted works

Meta’s Defense in AI Copyright Lawsuit

Bloomberg asks US court to toss copyright lawsuit over AI training

Indian news agency ANI sues OpenAI for unsanctioned content use in AI training

Comment