ancuuiqter

joined 1 year ago
 

Today we are forced to share some sad news - yesterday many of our domains were seized again. We should highlight that the majority of the seized domains were not mirrors of the Z-Library website. Instead, they were separate sub-projects, containing only books in rare languages of the world, and their blocking is perplexing. For instance, these domains included books in Tamil, Mongolian, Catalan, Urdu, Pashto, and other languages:

afrikaans-books.org

bengali-books.org

urdu-books.org

marathi-books.org

chamorro-books.org

Over the 15 years of the project's existence, we've managed to collect an impressive collection of rare texts in many uncommon languages. These domains featured many unique texts that can't be found anywhere else, including rare books, documents, and manuscripts. All of this is a priceless heritage, contributing to the preservation and study of world cultures, and serving as important material for researchers in linguistics, anthropology, and history.

Z-Library also states in the blog post that they did not lose the files, just the domains.

 

Authorities have caught two Ho Chi Minh City-based firms printing more than 15,000 pirated copies of books and 2024 calendars, with a combined weight exceeding 15 metric tons.

Police officers from the Ministry of Public Security and the city, in coordination with inspectors from the municipal Department of Information and Communications on Monday morning raided Kien A Packing Production and Trading Service Company in the city’s outlying Cu Chi District.

The company was caught with 3,000 illegally printed copies of ‘Kinh Truong Tho diet toi’ (Long-life sutra destroys sins) from Ton giao (religion) Publishing House and 9,000 illegally printed copies of 'Sherlock Holmes' from the Writers’ Association Publishing House. The combined weight of the books was 10 metric tons.

Authorities had not given their approval for the books to be printed. All of the illegally copies have since been seized.

 

cross-posted from: https://lemmy.world/post/1330512

Below are direct quotes from the filings.

OpenAI

As noted in Paragraph 32, supra, the OpenAI Books2 dataset can be estimated to contain about 294,000 titles. The only “internet-based books corpora” that have ever offered that much material are notorious “shadow library” websites like Library Genesis (aka LibGen), Z-Library (aka B-4ok), Sci-Hub, and Bibliotik. The books aggregated by these websites have also been available in bulk via torrent systems. These flagrantly illegal shadow libraries have long been of interest to the AI-training community: for instance, an AI training dataset published in December 2020 by EleutherAI called “Books3” includes a recreation of the Bibliotik collection and contains nearly 200,000 books. On information and belief, the OpenAI Books2 dataset includes books copied from these “shadow libraries,” because those are the most sources of trainable books most similar in nature and size to OpenAI’s description of Books2.

Meta

Bibliotik is one of a number of notorious “shadow library” websites that also includes Library Genesis (aka LibGen), Z-Library (aka B-ok), and Sci-Hub. The books and other materials aggregated by these websites have also been available in bulk via torrent systems. These shadow libraries have long been of interest to the AI-training community because of the large quantity of copyrighted material they host. For that reason, these shadow libraries are also flagrantly illegal.

This article from Ars Tecnica covers a few more details. Filings are viewable at the law firm's site here.