Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>"The Co-Pilot suit is ostensibly being brought in the name of all open source programmers. Yes, that’s right, people crusading in the name of open source–a movement intended to promote freedom to use source code–are now claiming that a neural network, designed to save programmers the onus of re-inventing the wheel when they need code to perform programming tasks, is de facto unlawful."

Maybe people wouldn't be so angry about an AI trained on mostly open source code if said AI was open source, and not a proprietary SaaS.



> Maybe people wouldn't be so angry about an AI trained on mostly open source code if said AI was open source, and not a proprietary SaaS.

Exactly, the point is this one. Open-source doesn't mean liability free, you still have to comply to the license!


You don't if you are creating an entirely new work of art based solely on the knowledge and patterns you've learned from looking at other code, ie. a human learning to code by looking at millions of pages of code on GitHub; whether or not AI can learn in this way is the point of contention for AI art/code/chat generators.


If the AI "independently" comes up with a 1:1 copy of some piece of copyrighted code, would this be a copyright violation or not?

There's a reason why some programmers don't even look at proprietary source code leaks as to not accidentally introduce copyright violations into their own code.


> would this be a copyright violation or not?

It would be. When a human does this, does it invalidate the human's ability to create any new work at all? Should we chain up anyone who violated copyright by perfectly recalling someone's art in memory and re-drawing it from heart, since we cannot trust them to ever create an original work again?


Copyright is (mostly) not about copying for your own use, but about commercial exploitation. This topic has been discussed to death since at least the Sony Walkman. Nothing of this is new or different just because an algorithm is now involved.

If you copy for your own use only, that's totally fine - or at most a legal greyzone, in the end nobody will care about such personal use copies. If you use AI to generate pretty pictures to hang up in your home, totally fine too.

As soon as you start making money with this stuff though it becomes an actual problem.

It's really as simple as that.

Even the 'generative art aspect' has already been settled long ago when music sampling became popular and required a legal framework.


Software and human beings are two different sorts of things and should be treated differently.


The trick here is to implicitly personify the AI (a program) by comparing it to a human. Because they are both “learning”.

There’s no reason why we should have the same standards for programs and humans based on metaphors.

If I log in to a website three times a day, I am simply using a website. If a program logs in to a website three thousand times in the span of a second from multiple IP addresses, that’s probably a DOS attempt.


Human minds can commit copyright infringement without realizing by it if they regurgitate parts of something someone has already created. Google “George Harrison.”


The `human learning` argument comes a lot in every discussion about copilot, but it's a completely different thing, and misleading. `Human learning` involves understanding beyond the words and sentences, copilot doesn't know anything about our world.


Co-Pilot generated code is based on works that come from a variety of licenses. The generated code therefore must be licensed according to the license the code it was derived from used. In many cases these licenses are not compatible and the generated code, being derived from copyrighted and licenses works, is in violation of copyright law.


I think this interpretation works if the code being generated is seen as essentially being retrieved by a lossy lookup function.

But another interpretation is that the generic structure of the code was learned from the works, which is not copywritable. And that generic structure was used to synthesize new code, in much the same way a human who had seen a pattern in a proprietary codebase years ago was able to use that pattern in their own code. I am not a lawyer but most licenses do not prohibit that in my experience. More often in my experience this is what is happening with generative ai.

The tricky bit is that the ai can probably do both in the eyes of copywrite law, since the boundary seems to be very context dependent and existing models don’t have any concept of how much you need to compress and forget the specific details so that it is seen as novel by the courts. The model can memorize significant parts of some inputs despite not having nearly enough space for memorizing the input set, so the first interpretation is possible even if it isn’t the typical output. There isn’t really a kind of “courts will see this as novel” regularizer and there might need to be?


You are just hiding the more complex argument behind the word "learned", which is not something in the normal understanding of the word that's attributed to a computer.


I can expand on that a bit- the weights in the big generative models are still basically too small to hold a significantly number of the input set with anything we would call compression today. This forces the model to strip the input down to some discovered bare structure, which when humans do this we call it things like “archetypes” or “theme” and it’s not generally copyrightable. Many LLM aren’t even trained for multiple epochs so it’s not optimized for memorization as much as it is forced to extrapolate on future examples. I’m arguing that the problem is that the computer has no knowledge of where the line is when it becomes plagiarism in our courts, not that it is always plagiarizing. I think it clearly can’t be always plagiarizing from anecdotal experience of using them and from just doing back of the envelop math on how many bits it has to memorize each input string.


You equating that process to something that "humans do" is anthropomorphism.

It's not true that that is what humans do.

Having knowledge of where the line is with regards to copyright liability is not an element required to prove liability. i.e. it's of no consequence that the infringer doesn't realize or know that they are infringing. Copyright is strict liability in that sense.


If viewed like this, you could argue that with every single line of code open source devs are working towards:

1) SaaS AI people getting richer

2) Devs have less work in the future

I’m not sure if that’s a good development for the open source movement.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: