Sarah Silverman sues OpenAI, Meta for being 'industrial strength plagiarists'

Comedian and author Sarah Silverman.Enlarge / Actress and author Sarah Silverman. Jason Kempin / Staff | Getty Images North America

On Friday, law firm Joseph Saveri filed U.S. federal class action lawsuits on behalf of Sarah Silverman and other authors against OpenAI and Meta, accusing the companies of unlawfully using copyrighted material to train AI language models such as ChatGPT and LLaMA.

Other represented authors include Christopher Golden and Richard Kadrey, and an earlier class action lawsuit filed by the same firm on June 28 included authors Paul Tremblay and Mona Awad. Each lawsuit alleges violations of the Digital Millennium Copyright Act, unfair competition laws, and negligence.

Law firm Joseph Saveri is no stranger to press-friendly lawsuits against generative AI. In November 2022, the same company filed a lawsuit against GitHub Copilot for alleged copyright infringement. In January 2023, the same legal group repeated this formula with a class action lawsuit against Stability AI, Midjourney and DeviantArt over AI image generators. The GitHub lawsuit is currently pending, according to attorney Matthew Butterick. Procedural maneuvers in the Stable Diffusion trial are still ongoing with no clear outcome at this time.

In a press release last month, the law firm described ChatGPT and LLaMA as "industrial-strength plagiarists who violate the rights of book authors." The authors and editors have been contacting the law firm since March 2023, attorneys Joseph Saveri and Butterick wrote, because the authors "are concerned" about "the uncanny ability of these AI tools to generate similar text to that found in copyrighted textual records, including thousands of books."

The most recent lawsuits against Silverman, Golden and Kadrey were filed in US District Court in San Francisco. The authors demanded jury trials in each case and seek a permanent injunction that could force Meta and OpenAI to make changes to their AI tools.

Meta declined Ars' request for comment. OpenAI did not immediately respond to Ars' request for comment.

A spokesperson for the law firm Saveri sent Ars a statement saying, "If this alleged behavior is allowed to continue, these models will eventually replace the authors whose stolen works power these AI products with which they compete. This new costume represents a broader fight to preserve the property rights of all artists and other creators."

Accused of using 'patently illegal' datasets

Neither Meta nor OpenAI have fully disclosed the contents of the datasets used to train LLaMA and ChatGPT. But attorneys for the authors who are suing say they deduced the likely data sources from clues in statements and documents released by the companies or associated researchers. The authors accused both OpenAI and Meta of using training datasets containing copyrighted material distributed without the consent of the authors or publishers, including downloading works from some of the most great e-book pirate sites.

In the OpenAI lawsuit, the authors alleged that, based on OpenAI's disclosures, ChatGPT appeared to have been trained on 294,000 books allegedly downloaded from "notorious 'shadow library' websites like Library Genesis ( aka LibGen), Z-Library (aka Bok), Sci-Hub and Bibliotik." Meta revealed that LLaMA was trained on part of a dataset called ThePile, which the

Sarah Silverman sues OpenAI, Meta for being 'industrial strength plagiarists'
Comedian and author Sarah Silverman.Enlarge / Actress and author Sarah Silverman. Jason Kempin / Staff | Getty Images North America

On Friday, law firm Joseph Saveri filed U.S. federal class action lawsuits on behalf of Sarah Silverman and other authors against OpenAI and Meta, accusing the companies of unlawfully using copyrighted material to train AI language models such as ChatGPT and LLaMA.

Other represented authors include Christopher Golden and Richard Kadrey, and an earlier class action lawsuit filed by the same firm on June 28 included authors Paul Tremblay and Mona Awad. Each lawsuit alleges violations of the Digital Millennium Copyright Act, unfair competition laws, and negligence.

Law firm Joseph Saveri is no stranger to press-friendly lawsuits against generative AI. In November 2022, the same company filed a lawsuit against GitHub Copilot for alleged copyright infringement. In January 2023, the same legal group repeated this formula with a class action lawsuit against Stability AI, Midjourney and DeviantArt over AI image generators. The GitHub lawsuit is currently pending, according to attorney Matthew Butterick. Procedural maneuvers in the Stable Diffusion trial are still ongoing with no clear outcome at this time.

In a press release last month, the law firm described ChatGPT and LLaMA as "industrial-strength plagiarists who violate the rights of book authors." The authors and editors have been contacting the law firm since March 2023, attorneys Joseph Saveri and Butterick wrote, because the authors "are concerned" about "the uncanny ability of these AI tools to generate similar text to that found in copyrighted textual records, including thousands of books."

The most recent lawsuits against Silverman, Golden and Kadrey were filed in US District Court in San Francisco. The authors demanded jury trials in each case and seek a permanent injunction that could force Meta and OpenAI to make changes to their AI tools.

Meta declined Ars' request for comment. OpenAI did not immediately respond to Ars' request for comment.

A spokesperson for the law firm Saveri sent Ars a statement saying, "If this alleged behavior is allowed to continue, these models will eventually replace the authors whose stolen works power these AI products with which they compete. This new costume represents a broader fight to preserve the property rights of all artists and other creators."

Accused of using 'patently illegal' datasets

Neither Meta nor OpenAI have fully disclosed the contents of the datasets used to train LLaMA and ChatGPT. But attorneys for the authors who are suing say they deduced the likely data sources from clues in statements and documents released by the companies or associated researchers. The authors accused both OpenAI and Meta of using training datasets containing copyrighted material distributed without the consent of the authors or publishers, including downloading works from some of the most great e-book pirate sites.

In the OpenAI lawsuit, the authors alleged that, based on OpenAI's disclosures, ChatGPT appeared to have been trained on 294,000 books allegedly downloaded from "notorious 'shadow library' websites like Library Genesis ( aka LibGen), Z-Library (aka Bok), Sci-Hub and Bibliotik." Meta revealed that LLaMA was trained on part of a dataset called ThePile, which the

What's Your Reaction?

like

dislike

love

funny

angry

sad

wow