> If the AI industry is to survive, we need a clear legal rule that neural networks, and the outputs they produce, are not presumed to be copies of the data used to train them. Otherwise, the entire industry will be plagued with lawsuits that will stifle innovation and only enrich plaintiff’s lawyers
Or maybe, get this, how about people running AI only feed them information that they legally have the right to use? How is it a bad thing that somebody can't legally steal other peoples' work without their permission because of pesky copyright?
As an extension of this, only allow children to look at works they purchased publication rights to, lest their creative output becomes influenced by a different person's style.
There is absolutely no comparison here, because children don't charge you to look at their artwork, if you ask nicely, they will probably give it to you for free. Companies using other peoples work without permission to train AI, will charge.
Your suggestion would be accurate if we lived in a world where we all shared, and there was no money, and copyright didn't exist, but we don't.
It is my understanding that it makes no legal difference (at least in my country) whether I charge for my work or not when it infringes somebody's copyright. Simply sharing it is sufficient to get into trouble.
It sounds like your country is the weird one. Try burning the complete works of Disney onto stacks of DVDs, then go down to your high street and hand them out. See how long you get away with that for.
I think the big difference is distribution vs consumption. Where I live, there are additional clauses in law, for mass reproduction and selling.
But republishing any work as your own, probably falls into that category. And it isn't about profit, but commercial use; thus pasting onto a blog to improve your business (rankings, hit count) is a business use case.
There are plenty of competitors to the corporate AI models that are freely distributed, free to use, etc. You just need to have the hardware, which is admittedly pricey. The worst outcome is if there is a legal risk in creating AI models that only big companies with an army of lawyers large enough to fend off lawsuits can afford to face. Then you'd have the continuation of big tech controlling things for "responsibility" reasons instead of AI being a technology anybody can use.
These AI companies are making serious amounts of money (OpenAI is valued in tens of billions) on the back of artists who never gave permission for their work to be used in this way.
If a child took an artist's work, copied it and made significant amounts of money from selling it then yes they should be within the purview of copyright law.
> who never gave permission for their work to be used in this way.
the copyright aren't all encompassing. There's only an enumerated set of rights granted, and "this way" (aka, training an AI model) is not one of those restricted activities (like distribution or broadcast).
Unless the model can be argued to be a derivative work of the training data set (which i don't believe it is, since the process of training is sufficiently transformative imho), the original copyright holders of the training data do not need to be asked permission.
I wouldn't bet for or against that without legal advice, and even then it might vary by jurisdiction. Legal fictions are a thing, but I'm no lawyer, and I know better than to assume my interpretation of any legal issue is any better than a Hollywood script writer's: https://en.wikipedia.org/wiki/Legal_fiction
Exactly. If I produce a work, and then you produce an identical-enough work, you're infringing my copyright. I don't care how you did it, it makes absolutely no difference to me.
Artists get inspired by others all the time, and if the results are far enough from each other, then nobody has a problem with that. In fact, pre copyright, the similarities used to be even larger. Art lives from the concept of taking ideas, and improving on them.
The EU has a copyright exemption for noncommercial model training, and at least the UK is changing that to even allow commercial model training, without even an opt-out required. So it appears they legally have the right to use anything.
Why should you want a model designed to know all human knowledge to know only public-domain knowledge?
> Or maybe, get this, how about people running AI only feed them information that they legally have the right to use? How is it a bad thing that somebody can't legally steal other peoples' work without their permission because of pesky copyright?
On one extreme:
"Unless you pay your annual Disney fee for having watched Disney films in early childhood, you will need to return your brain to us for processing. Disney was used as the basis for all concepts you know, and as such, Disney owns all subsequent intellectual output of your brain."
And on the other:
In the age of AI, copyright will cease to hold weight. We'll make more new content on a per-month basis than all of recorded human history. The old regime must be thrown away to accommodate the radically new world we're entering.
We'll land somewhere in-between, and I'm hoping it's much closer to (or even precisely) the latter.
Very curious that so many people adopted this position exactly when it became feasible for giant corporations to profit by mass producing laundered copyrighted works!
Those wanting AI to respect copyright are going to find that the big players will navigate copyright just fine. It's the small players that won't. They're advocating for institutional control over AI.
One of the first commercial uses of modern neural networks was Microsoft laundering GPL code with Copilot, so I’m not really sure what you mean by saying that “big players will navigate copyright just fine”.
There are laws in the EU that only apply to large companies, like Facebook et. al. because they have much more power in certain spaces. Similar laws can be made for Disney vs. small studios, e.g. "if turnover is less than 100M EUR/month..." - I feel this is often proposed as a false dichotomy.
> how about people running AI only feed them information that they legally have the right to use?
Well, this is the crux of the matter, isn't it? Do you, a human, have the right to look at copyrighted works and learn from them? Do you have the right to use AI to do the same?
> Copyright only governs publishing. So you have the right to train AI with any and all data you have access to, as far as copyright is concerned.
Sure, but you still just cannot output anything that looks like a derived or copied work.
So, maybe ... how about if image generation nets hold onto the training images so that it can compare the generated output against its training data to ensure that it is not too similar.
Derivative works are their own things (when sufficiently derivative). And AIs are not humans - using an algorithm does not automatically remove the copyright. See also "I uploaded a movie to youtube but it's upside down, why did it get taken down".
But that's because the movie is still recognisable. Using an algorithm doesn't automatically remove copyright, but if the algorithm transforms the data to a point where it can't be recognised as the original work, then it isn't breaking copyright.
But the AI as a whole is capable of reproducing the original in a recognizable form, and it does so on demand quite easily, because it was trained on them - how is it different than selling a zip file containing millions of copyrighted works, and also a bunch of new stuff?
Copyright law hinges on human element of the actions taking place, not on mathematical technicality. The digits of pi are not creative human expression, nor are they derived from human expression, they're a factual mathematical discovery. They can neither infringe on copyright, nor are they subject to it themselves.
Or maybe, get this, how about people running AI only feed them information that they legally have the right to use? How is it a bad thing that somebody can't legally steal other peoples' work without their permission because of pesky copyright?