Skip to content
October 17, 2025Cryptopolitan logoCryptopolitan

Author group sues Salesforce for building XGen AI models on pirated books library

Salesforce, a software giant, has been sued by a group of authors in federal court in San Francisco for building its XGen AI models on a pirated library of ￰0￱ to the lawsuit, they scrubbed references to those sources once questions ￰1￱ lawsuit was filed on Wednesday by authors ￰2￱ Tanzer and Jennifer Gilmore under the Copyright ￰3￱ states ongoing infringement, saying Salesforce “continues to do so by continuing to store, copy, use, and process the datasets containing copies of Plaintiffs’ copyrighted books.” The complaint cites statements from Salesforce CEO Marc Benioff, who told a Bloomberg interviewer in January 2024 that AI companies ripped off training data and that all the training data has been ￰4￱ authors seek class certification for all US copyright holders whose works have been used since October ￰5￱ are seeking statutory damages, the destruction of infringing copies, the return of profits, a declaration of willful infringement, and attorneys’ ￰6￱ faces a strong case; AI companies escaped similar claims According to the complaint, Salesforce pirated hundreds of thousands of copyrighted books to develop its XGen series of large language ￰7￱ did this by using the “notorious RedPajama and The Pile datasets,” which have a book corpus called Books3 that has more than 196,000 books copied from the private tracker ￰8￱ filing states that Salesforce first mentioned “RedPajama-Books” as one of its training sources when it launched XGen in June ￰9￱ engineer for the company then linked GitHub users directly to both datasets.

However, by September, those mentions were taken down from Salesforce’s website and replaced with vague descriptions of “natural language data” from “publicly available sources.” The next month, Hugging Face, the site that hosted Books3, removed the dataset due to copyright concerns. Additionally, the lawsuit revealed that in 2022, Salesforce trained its CodeGen models on The ￰10￱ company then introduced the technology to the market through its Agentforce AI platform, with the XGen-Sales model being released in October 2024. However, according to experts, authors must prove real financial harm, not just that their books were used for training. Recently, Judge Vince Chhabria dismissed similar claims against Meta, ruling that “simply claiming ‘our work was used’ isn’t enough.” To that end, the judge found Meta’s use of copyrighted books for training AI as fair use.

Additionally, as reported by Cryptopolitan, recent rulings have favored OpenAI and Anthropic in similar cases, with judges finding that authors failed to prove market harm. However, one judge criticized Anthropic for maintaining a permanent library of pirated ￰11￱ taps Google’s Gemini AI to power Agentforce 360 In other news, Salesforce has extended its partnership with Google to include deeper integration of Gemini AI models with its Agentforce 360 platform. Gemini’s multimodal intelligence will be integrated into the Salesforce ecosystem as a result of the ￰12￱ will help support tasks such as hybrid reasoning and multi-step process automation across enterprise sales and IT ￰13￱ expanded integration enables the Atlas Reasoning Engine, central to Agentforce 360, to leverage Gemini ￰14￱ gives enterprise workflows additional model options.

Additionally, the hybrid reasoning capability enables users to set up AI agents within Salesforce that produce consistent and accurate ￰15￱ collaboration also extends the reach of Salesforce’s Gemini integration, previously limited to Gmail, to other Google Workspace applications, including Sheets, Docs, Drive, Slides, and ￰16￱ 360 now supports native interoperability with Google Workspace, allowing users to initiate sales engagements, qualify leads, and schedule meetings from within applications like Gmail and Google ￰17￱ also provides direct access to Salesforce Customer 360 apps within Google tools, streamlining data access and workflow continuity for sales and service ￰18￱ chief scientist Silvio Savarese said , “In the enterprise environment, it’s imperative for AI agents to be highly capable and highly consistent, especially for critical use cases … Together, we are setting a new standard for building the future of what’s possible in the Agentic Enterprise down to the model level.” Claim your free seat in an exclusive crypto trading community - limited to 1,000 members.

Cryptopolitan logo
Cryptopolitan

Latest news and analysis from Cryptopolitan

Bitcoin Flatlines As LTH Distribution Hits 810K Coins: Demand Still Absorbing Supply

Bitcoin Flatlines As LTH Distribution Hits 810K Coins: Demand Still Absorbing Supply

Bitcoin (BTC) is attempting to reclaim the $110,000 level after a sharp downside move pressured markets and triggered renewed volatility across the crypto landscape. While this pullback has been uncom...

Bitcoinist logoBitcoinist
1 min
Crypto Crash Wipes Out $800M in Liquidations: Why the Noomez ($NNZ) Presale Is the Safest Bet in a Volatile Market

Crypto Crash Wipes Out $800M in Liquidations: Why the Noomez ($NNZ) Presale Is the Safest Bet in a Volatile Market

The market took a sharp turn in the latest crypto crash as more than $800 million in liquidations swept across exchanges within hours. Bitcoin dropped below $109,000, highlighting the fragility of inv...

TimesTabloid logoTimesTabloid
1 min
Coinbase leads bid to acquire BVNK in $1.5 billion‑$2.5 billion deal

Coinbase leads bid to acquire BVNK in $1.5 billion‑$2.5 billion deal

Coinbase is in late‑stage talks to acquire BVNK, a stablecoin infrastructure startup based in London, in a deal valued around $1.5 billion to $2.5 billion, according to a Bloomberg’s report citing peo...

Cryptopolitan logoCryptopolitan
1 min