Jan 2 (Reuters) - If 2023 was the year that artificial intelligence changed everything, 2024 could go down as the year that U.S. copyright law changes AI.
The explosion of generative AI and the popularity of products from Microsoft-backed (MSFT.O) OpenAI, Meta Platforms (META.O), Midjourney and others led to a spate of copyright cases by writers, artists and other copyright holders who say AI has only succeeded thanks to their work.
Judges so far have been skeptical of the plaintiffs' infringement claims based on the content generated by AI. But courts have not yet addressed the trickier, potentially multi-billion-dollar question of whether AI companies are infringing on a massive scale by training their systems with reams of images, writings and other data scraped from the internet.
Tech companies warn that the lawsuits could create giant roadblocks for the burgeoning AI industry. The plaintiffs say the companies owe them for using their work without permission or compensation.
THE CASES
Several groups of authors have filed proposed class-action lawsuits this year over the use of their text in AI training. They include writers ranging from John Grisham and "Game of Thrones" author George R.R. Martin to comedian Sarah Silverman and former Arkansas governor Mike Huckabee.
Similar lawsuits have also been filed by copyright holders including visual artists, music publishers, stock-photo provider Getty Images and the New York Times.
They all argue that tech companies infringe their copyrights by taking and reproducing their materials without permission for AI training. The plaintiffs are asking for monetary damages and for court orders blocking the misuse of their work.
THE DEFENSE
Tech companies have hired legions of lawyers from some of country's largest law firms to fight the cases. They have defended their AI training in comments to the U.S. Copyright Office, comparing it to how humans learn new concepts and arguing that their use of the material qualifies as "fair use" under copyright law.
"Just as a child learns language (words, grammar, syntax, sentence structure) by hearing everyday speech, bedtime stories, songs on the radio, and so on, a model 'learns' language by being exposed — through training — to massive amounts of text," Meta told the office.
AI proponents also argued that adverse rulings would be disastrous for the industry, which they say has relied on a reasonable assumption that copyright law protects their data handling.
Silicon Valley venture-capital firm Andreessen Horowitz said that "imposing the cost of actual or potential copyright liability on the creators of AI models will either kill or significantly hamper their development."
Copyright owners, meanwhile, point to the companies' enormous success with AI programs like OpenAI's large language model-based (LLM) chatbot ChatGPT — and say they have money to spare.
"Licensing the copyrighted materials to train their LLMs may be expensive — and indeed it should be given the enormous part of the value of any LLM that is attributable to professionally created texts," writers trade group The Authors Guild told the copyright office.
WHAT'S NEXT
An ongoing lawsuit involving Thomson Reuters (TRI.TO) — the parent company of Reuters News — could be one of the first major bellwethers for AI copyright issues.
The information-services company accused Ross Intelligence in 2020 of illegally copying thousands of "headnotes" from Thomson Reuters' Westlaw legal research platform, which summarize points of law in court opinions, to train an AI-based legal search engine.
A federal judge ruled in September that the Delaware case must go to trial to determine whether Ross broke the law. The case could set a key early precedent on fair use and other questions for AI copyright litigation.
A jury could begin to hear the case as early as next August.
Comment