Millions of books bought, cut and fed to AI to train Claude and circumvent copyright: the Anthropic case breaks out

A book, before becoming a file, is still an object. It has a spine, pages, glue, weight, dust. In the case of Anthropic, the company that develops Claude, this very concrete part ended up in an industrial procedure: books purchased on the second-hand market, cut, scanned and transformed into digital text. What remained of the volumes was then sent for recycling.

The internal name of the project was Project Panama. From the documents that emerged in the copyright case, we understand the meaning of the operation: to collect a large quantity of physical books to train artificial intelligence models. Books were useful because they were considered a better linguistic subject than the noise of the web. Fewer sentences collected at random online, more texts written, edited and published.

From books to data

The strongest part lies in the method. The volumes arrived from second-hand dealers, were prepared for destructive scanning, cut along the spine and passed through professional high-speed scanners. Once digitized, no more books came back. There remained data on one side, paper to be recycled on the other.

The precise quantities are not entirely clear, but there is talk of hundreds of thousands, perhaps millions of volumeswith a project designed to digitize between 500,000 and 2 million books in about six months. Not a small archive operation. A supply chain, with suppliers, warehouses, cutting machines, scanners, costs and logistics.

And this is where the case becomes interesting even outside the legal debate. Artificial intelligence is often described as something light, remote, almost immaterial: cloud, algorithm, clean interface. Here, however, the cloud makes the noise of paper. It has boxes, industrial blades, detached pages, books bought and dismantled.

The copyright issue

In the US proceedings, Judge William Alsup distinguished two levels. The use of legally purchased and then scanned books to train Claude was considered compatible with the fair usethe American doctrine that in some cases allows the use of protected works without authorization. The story about pirated books was different: the documents revealed that Anthropic had downloaded and stored millions of texts from illegal archives, and that part was treated as a separate violation.

The transition to used physical books therefore also appears to be a choice of legal prudence. Buying a paper copy gave the company more solid ground than downloading from pirate libraries. In the United States, anyone who buys a physical object can resell it, lend it or destroy it. The problem arises when that object is transformed into a digital copy and inserted into systems capable of generating new text.

Anthropic then accepted a deal from 1.5 billion dollars to close the authors’ class action, without admitting responsibility. The settlement covers pirated works and provides approximately $3,000 per book involved. As of May 2026, however, final approval was still under review: Judge Araceli Martinez-Olguin asked for further details on legal fees and payments to the lead plaintiffs.

AI doesn’t arise out of nowhere

The Anthropic case concerns Claude, but it speaks to the entire industry. Great generative models need texts, images, code, articles, manuals, novels, essays. They need human labor already produced. Sometimes that work is authorized and paid for. Other times it is collected en masse, stuffed into opaque datasets and discussed only when a lawsuit arrives.

Project Panama makes this dependence visible. To make a machine write better, books written by people were needed. To make a chatbot more natural, works created by authors, editors, translators, proofreaders, publishing houses, libraries and readers were used. The digital promise still relies on very physical matter.

The topic also concerns Europe, where the relationship between copyright, data mining and artificial intelligence remains open. Companies talk about innovation, transformation, progress. Those who create content ask for permissions, compensation, traceability. In the middle there are courts, still young rules and a very concrete question: How much is human labor worth when it becomes fuel for AI?