> Do you ask for permission when you train your mind on copyrighted books? Yes, ...

sircastor · on Jan 22, 2023

Copyright does not grant the right to consume content. It only deals with distribution, whether that is done through a copy, or a derivative.

It is not illegal to read a stolen book, only to steal the book.

flawi · on Jan 22, 2023

This is literally what the AI does as well. It didn't walk into a bookstore and steal all the books off the shelf, it read through material made available to it entirely legally.

The thing that authors are trying to argue here is that they should get to control what type of entity should be allowed to view the work they purchased. It's the same as going "you bought my book, but now that I know you're a communist, I think the courts should ban you from reading it".

denton-scratch · on Jan 22, 2023

> they should get to control what type of entity should be allowed to view the work they purchased

No, that's not it. It's more like if I memorized a bunch of pop-songs, then performed a composition of my own whose second verse was a straight lift of a song by Madonna. I would owe her performance royalties. And I would be obliged to reproduce her copyright notice, so that my audience would know that if they pull the same stunt, they're on the hook for royalties too.

Dylan16807 · on Jan 22, 2023

There are lots of people arguing against the training itself. And people arguing against all outputs, even when there is no detectable copying. I don't know how you missed those takes. You're arguing the wrong point here. Many people do want to say "no ai can look".

htfu · on Jan 22, 2023

Only if you released it. You could definitely perform it in the shower without owing anything. And the 99% of your compositions that didn't wholesale mirror any specific song would be perfectly fine to release.

Now, moving from holding the model creator culpable to the user would obviously be problematic as well, since they have no way of knowing whether the output is novel or a copy paste. Some sort of filter would seem to be the solution, it should disregard output that exactly or almost exactly matches any input.

Winsaucerer · on Jan 22, 2023

But it's not humans reading it, it's using it to train ML models. There are similarities between humans learning from books and ML models being trained on it, but there are also salient differences, and those differences lead to concerns. E.g., I am concerned about these large tech companies being the gatekeepers of AI models, and I would rather see the beneficiaries and owners of these models also be the many millions or billions of content creators who first made them possible.

It's not obvious to me that the implicit permission we've been granting for humans to view our content for free also means that we've given permission for AI models to be trained on that data. You don't automatically have the right to take my content and do whatever you like with it.

I have a small inconsequential blog. I intended to make that material available for people to read for free, but I did not have (but should have had!) the foresight to think that companies would take my content, store it somewhere else, and use it for training their models.

At some point I'll be putting up an explicit message on my blog denying permission to use for ML training purposes, unless the model being trained is some appropriately open-sourced and available model that benefits everyone.

chii · on Jan 22, 2023

> You don't automatically have the right to take my content and do whatever you like with it.

actually you don't have the right to restrict the content, except as part of what's allowed in copyright law (those rights a spelt out - like distribution, broadcasting publicly, making derivative works).

specifically, you cannot have the right to restrict me from reading the works, and learning from it.

Imagine a hypothetical scenario - i bought your book, and counted the words and letters to compile some sort of index/table, and published that. Not a very interesting work, but it is transformative, and thus, you do not own copyright to my index/table. You cannot even prevent me from doing the counting and publishing.

Winsaucerer · on Jan 22, 2023

I assume you’re referring to US law here. Is there a handy place where these permitted restrictions are listed and described?

chii · on Jan 23, 2023

https://copyright.gov/title17/92chap1.html#106

The section titled "Exclusive rights in copyrighted works".

There are 6 rights.

(1) to reproduce the copyrighted work in copies or phonorecords;

(2) to prepare derivative works based upon the copyrighted work;

(3) to distribute copies or phonorecords of the copyrighted work to the public by sale or other transfer of ownership, or by rental, lease, or lending;

(4) in the case of literary, musical, dramatic, and choreographic works, pantomimes, and motion pictures and other audiovisual works, to perform the copyrighted work publicly;

(5) in the case of literary, musical, dramatic, and choreographic works, pantomimes, and pictorial, graphic, or sculptural works, including the individual images of a motion picture or other audiovisual work, to display the copyrighted work publicly; and

(6) in the case of sound recordings, to perform the copyrighted work publicly by means of a digital audio transmission.

alpaca128 · on Jan 22, 2023

> It didn't walk into a bookstore and steal all the books off the shelf, it read through material made available to it entirely legally.

Github ignored the licenses of countless repos and simply took everything posted publicly for training. They didn't care whether it was available to them entirely legally, they just pretended that copyright doesn't exist for them.

Dylan16807 · on Jan 22, 2023

Isn't the definition of public repo that anyone is allowed to download and read it?

SonicTheSith · on Jan 22, 2023

Nope, public repos have license, often open source licenses that state that you can freely use the code, or change it , but only if the resulting product will also be opensource.

Other licenses such as the MIT license require that you name the original creator.

Dylan16807 · on Jan 22, 2023

You don't need to accept that license to download and read the code.

A license allows new uses that copyright would otherwise block. Some kinds of AI training are fully local and don't make the AI into a derivative work, so they don't need any attribution and you don't need to accept the license to distribute.

oneeyedpigeon · on Jan 22, 2023

But no license (that I'm aware of) says "You are allowed to read this source code, but you may not produce work as a result of learning from it"; for a start, that would clearly be impractical to enforce.