Hacker Newsnew | past | comments | ask | show | jobs | submit | _1cjw's commentslogin

> It is good to know that your base64 encoding function is tested for all corner cases, but integration and behaviour tests for the external interface/API are more important than exercising an internal implementation detail.

I think that this heavily depends upon the part that is tested or not tested. For certain parts in the kernel, such as the firewall, I truly believe, that such tests cases (including corner ones) shall be present.



But why? If you have an SSH server running, you can immediately setup a git server. Just do git init —-bare, no gigantic web server overhead.

https://rgz.ee/git.html


Gitlab has a nice GUI, team features, and CI/CD integration. It's a reasonable choice for a team.

I also remember the there were tools that use Git as a backend for a change review system, and for an issue-tracking system (much like the stuff which Fossil has integrated).

With that, you theoretically don't need a central server at all, as long as you can send patches to each other. In practice, a central server is an important convenience that helps keep the history synchronized between several developers.


Yeah you could even use recent ssh clients if you leave gitea, which is written in go, out of the equation:

https://github.com/go-gitea/gitea/issues/17798


Presumably people value the UI (also, what is the "gigantic web server overhead" you're referring to?).


Ever tried to install gitlab? It's like 1GiB compressed package. lol


Gitea is much more lightweight.


It is easy to give people read-only git access no SSH, if you want to share your code with the internet at large?


If you add a git-daemon-export-ok file to the repo, it's accessible read-only over the git protocol. That's how all my repos on https://git.jeskin.net are setup.

https://git-scm.com/book/en/v2/Git-on-the-Server-The-Protoco...


note: Transposing "it" and "is" has made my question look sort of passive-aggressive or sarcastic, this was accidental.


The intent was clear from the context, no worries!


> The larger issue is that anyone using GitHub is donating their work for re-use without attribution through Copilot.

Wrong. Lets say a GPL project is not hosted on GitHub officially. I can easily setup a mirror for it though on GitHub as the GPL doesn't prevent me from doing it...

Point is that anyone can put my work on GitHub, even if I don't want to.Assuming the project is under a free license though.


Yes, you can "donate" someone else's code, knowing Microsoft will violate a use-with-attribution license.

You can do it, but it's WRONG.

Copilot is trained on public GitHub repositories of any license: https://en.m.wikipedia.org/wiki/GitHub_Copilot#Technology

We must all stop using GitHub.


I wonder what kind of language you would need to add to your license in order to explicitly forbid the ingestion of your code by Copilot and/or like projects.


> I wonder what kind of language you would need to add to your license in order to explicitly forbid the ingestion of your code by Copilot and/or like projects.

If Microsoft's theory is correct, under US law it is impossible to forbid this with a license, because a license is just an offer of additional permissions beyond what is available automatically under law, and Microsoft's theory is that ingesting code into GitHub is fair use and therefore permitted under law as an exception to copyright without any license from the copyright owner.

If Microsoft's theory is not correct, pretty much any license with an attribution requirement (among others, for other reasons) would work.

The idea that a special license is needed or of any use doesn't seem to have any justification, even in theory. (As is the idea that hosting publicly but not on GitHub changes the legal parameters.)


You are correct.

I just want to say that I believe the fair use argument only applies for training, but not for distribution. I make the case for that in [1].

[1]: https://gavinhoward.com/uploads/copilot.pdf


There's a part that I don't understand. If some software is mirrored on github by someone that isn't the copyright owner, it seems like github shouldn't be able to use it. Yet they said nothing about that specifically. In that case, is the only option to put code somewhere else than github under a license that forbids reuploading to github, and issue DMCAs when/if people reupload your code? It also sounds like when code is removed through DMCA, it should be removed from the training set and they should retrain copilot.


> If some software is mirrored on github by someone that isn't the copyright owner, it seems like github shouldn't be able to use it.

If they don't need permission from the copyright owner, either via license of GitHub T&C, because it's fair use, which is their overt legal theory, then why would it matter legally whether the code was posted to GitHub at all, much less by whom? Ingesting only code form GitHub is a practical convenience that has nothing to do with their legal theory of the right to do it.

> Yet they said nothing about that specifically

Their theory of fair use means they have the right to ingest any code, irrespective of who owns it and what conditions (if any) it is licensed under or where (or even if) it is hosted online. They don't need a separate justification for your scenario if the theory they’ve cited is correct.

> In that case, is the only option to put code somewhere else than github under a license that forbids reuploading to github, and issue DMCAs when/if people reupload your code

Nope, that doesn't help at all, legally; it may help practically as long as they are just using GitHub hosted code and not consuming code from other public hosting platforms, but it has no bearing on their legal theory of why they can ingest code without additional permissions.


Thanks for the clarification. So according to their theory, I could train a model on any code, even private Microsoft code, and that would be okay? That sounds surprising to me.


IANAL, but I have written licenses for that purpose. [1] (I'm trying to get them reviewed by a lawyer, but can't afford to; maybe I'll do a GoFundMe.)

What I did is say that if you feed copyrighted software to an algorithm that itself outputs software, then the license applies to the output. This covers the output of compilers and such, but it would also cover Copilot in my opinion. We'll see what a lawyer says.

However, even with a license, I wouldn't doubt that Microsoft would just put it through GitHub anyway because finding them out would be extraordinarily hard.

[1]: https://yzena.com/licenses/


The “Yzena Copyleft License” states that it's a copyleft license, but it also states that it's not a viral license. According to Wikipedia, a viral license and a copyleft license are the same thing.


Wikipedia is not the best source of information.

There is a difference between "strong" copyleft and "weak" copyleft. An example of "weak" (non-viral) copyleft is the CDDL. In fact, the CDDL's Wikipedia page talks about strong and weak copyleft.

You can read [1] for a breakdown of copyleft by an actual lawyer. Suffice it to say that Wikipedia's Copyleft page is woefully inadequate.

[1]: https://writing.kemitchell.com/2018/10/24/How-to-Speak-Copyl...


If the Wikipedia article is misleading, it probably shouldn't be linked in the license…


I link to Wikipedia as a first point of call. People should always look deeper, and I can't help that.


My team has code, MIT and GPL, on GitHub. We know the risk of this kind of theft. We remain on GitHub for the discoverability. It's not so absolute.


How does that make what 37ef_ced3 said wrong? I’m not following your logic.


> The writing is on the wall. You MUST host your own code on a stand-alone website.

Because this does not prevent having the code to land on GitHub at the of the day assuming it is published under a free license.

Now it depends on how you interpret the "MUST". My logic only makes sense if you consider it to a dogmatic-like prevention.


The statement you tagged as “wrong” was:

> The larger issue is that anyone using GitHub is donating their work for re-use without attribution through Copilot.

Why is this wrong?


It is wrong as a specific statement about GitHub, because regardless of current practice, the legal theory for Copilot applies to any code anywhere, so anything that is publicly accessible would involve the same risk. It's not dependent on use of GitHub even if Microsoft has initially started their because it's easier for them.


So this sense of “wrong” is similar to “correct”.

Also, I don’t see how this is true. Code on my website is publicly accessible, but not in the public domain, nor licensed for re-use, unless I say that it is.


Licensing matters for things that would be forbidden without permission by copyright law. Fair use is an exception to copyright law. Microsoft's explicit legal theory around Copilot is that ingesting code for it (and ingesting content to train ML models more generally) is fair use. If there theory is correct, license is irrelevant, there is no legal (at least copyright-based) barrier to them using any source code they can get their hands on to train Copilot.


A thing I really enjoy about this guide is that it’s close to common C paradigms and practices. Many guides lack this and only show outdated ones.


And another person with the Erdős-Number 1 passed away :(


I miss "Software as a service" as a term


Google Chrome only allows to install extensions from there web store, his would be the end of uBlock Origin for Chrome

Even if I am a Firefox user I don't see a sense in using Chrome. The 'features' that Chrome can but Chromium cannot are these that we everyone love:

* DRM * Centralized extension managment


I am the author. I did not created TermGet, I'm just a developer of it. TermGet works different then sysget. TermGet is an console based interface where sysget only works with system arguments which is way more efficent.


I am the author of TermGet. TermGet is sort of like TermGet, but it has an interface. I wanted to make TermGet a cross platform software center for you terminal. I am only 13, and school has made it hard for me to find time for development, so TermGet development has slowed down. When I do have time TermGet is my favorite project to work on, and I'm probably not going to discontinue it until I run out of features to add. I need more feature ideas because TermGet has so many features, I'm running out of ideas.


Welcome to HN! You should post your work as a Show HN and ask the community for ideas. If you'd like to, email us at [email protected] and we'll give you some tips about how to do it for best results.


Coming soon :)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: