Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Copilot is trained on publicly available code.


"Publicly available" is different from "licensed under a FOSS license". Anyone other than the original author doesn't have the right to distribute, modify or even execute the code.

Even if the code is licensed under an open source license, in most cases it requires users to attribute the original author, while in some cases derivative work must be licensed under the same license too.

GitHub Copilot use the code without honoring these obligations.


You grant GitHub a license to display your code, same for Codeberg, they're not bound by your license because you're granting them specific rights. Secondly can you prove damages when a person uses your code but doesn't attribute you? If you can't your license isn't very useful. This is what we all need to understand.


How about mirrors?

GitHub doesn't pohobit mirroring repos/developing a fork from someone else's codebases.

The original author didn't grant GitHub a license to use their code, but their code may be used for training the model.


Displaying the code is different from selling it. GitHub's license explicitly says that it doesn't allow them to sell your code.


> distribute

Does moving code between a company's data centres count as distribution?

Heck doesn't merely hosting the files count as distribution?


"Distribute" means distribution to an end user.


No, but Copilot can be considered a form of distribution.


It's publicly available code that comes with licenses on how you can use that code that they ignore, such as share alike but they're charging. That's an ethical issue.

In the early days (months after public release not beta), it would literally give you code stolen from someone's repo, sometimes with credentials (security risk) and project specific stuff. That means because you took code verbatim to the naked eye you would be held liable for license violation.


But it doesn't respect the code's license.


Publically available doesn't mean you can use it for your model. Licenses still exist.


Are they crawling the internet looking for all the publicly available code that exists or are they just training their model on the code that people upload directly to their servers?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: