Is it transformative if I take all the pages in Hanya Yanagiharas A Little Life and use a thesaurus to change every second word?
Or a more realistic scenario: what if I translate it to Spanish without license from the author? That's not allowed, and yet I have "transformed" the work in the same way that an LLM does.
If I buy a book entitled "How to make a table" and then make a table, the author does not own the table I made.
If I buy a book and use it to prop up a table, the author likewise does not own the table, or any works I undertake on that table.
If I buy a book and rip out the pages to make a collage, the US is the only legal jurisdiction where I run even slight risk of civil penalties.
An LLM is downstream of a book. Using a book to make an LLM does not confer any rights or privilges towards the LLM on the original author, just as using a hammer or nails dont permit the hammer or nail manufacturers any royalties on what I make, even if I build a hammer making machine with them. Theres no right to the works of people who build on your work without reproducing your work, at least outside of strict copyleft.
Its like demanding a cut from people who learned how to use photoshop by watching your photoshop tutorial youtube videos.
This is why the most successful cases against LLMs have been on the "Did they purchase the book" side of the fence, and not on the "What did they do with it" outside of the one case, where the legal company tried to use the LLM to 1:1 reproduce the content they had a limited license to, but thats obviously a no go and they should have known better.
> Is it transformative if I take all the pages in Hanya Yanagiharas A Little Life and use a thesaurus to change every second word?
If you meant it literally.. I'd think that such a version would be a sort of parody. It'd be up to lawyers doing their cross-examinations to prove the work was intended for such a purpose though..
> Or a more realistic scenario: what if I translate it to Spanish without license from the author? That's not allowed, and yet I have "transformed" the work in the same way that an LLM does.
Probably a lawyer would answer this better than me, but the 'content' is the same and would violate copyright. There's also other factors, like if it was translated/distributed for free.
Besides that I regard that LLMs to hold mathematical observations in contrast to a translated work. So long as the user ensures the output isn't close to what's already available imo it fits the transformative criteria.
You cannot claim that a formulaic thesaurusing of a text is parody, not unless the process is related to the message of the original text itself. Even then, that's a dubious claim. Especially if it was done automatically.
I can just as well say that a translated work contains "linguistic observations". In fact a translator has to do a lot of transformative work in order to translate a text.
An LLM just takes a set of texts, looks at n-gram distributions, and generates similar text. It is quite literally a fuzzy way of copying. There aren't any mathematical observations in the output. Any math (statistics) is done in the copying process.
> You cannot claim that a formulaic thesaurusing of a text is parody, not unless the process is related to the message of the original text itself. Even then, that's a dubious claim. Especially if it was done automatically.
Oh even if it's not a parody it would look transformed enough that a first-time reader would be getting a completely different interpretation of the story* compared to the original source. And that's all that matters.
> There aren't any mathematical observations in the output. Any math (statistics) is done in the copying process.
Wrong. Weights, which these models comprise of, are literally numbers to an extensive mathematical equation.
> It is quite literally a fuzzy way of copying.
And no one knows/there is no consensus on what a 'fuzzy way of copying' is. It is either copying or it is not. You could say that training an LLM is abstracting and integrating various text into it's weights, hereby transforming the source material and again transforming it a second time via integrating it into its weights.
Even if it involved copying that isnt immediately an issue. Its the distribution of a copy thats an issue. And if you look at the data side by side, you can see that while copying might be part of the process of creating an LLM, the LLM is not a copy of its source material.
Or a more realistic scenario: what if I translate it to Spanish without license from the author? That's not allowed, and yet I have "transformed" the work in the same way that an LLM does.