Because I had a similar thought when Twitch's code "leaked" and I never got round to asking: is it illegal under US law to download such a leak for private inspection? That is, illegal in a way beyond copyright violation, such as being party to an act of espionage.
(This leak doesn't interest me at all, but if ever the source for QBasic leaked, say.. well I might like to read it without becoming a wanted fugitive ;-))
IANAL, the risk is generally not in the downloading per-se but the distribution and use. The copyright owner can sue for use or distribution of the IP. So you might be able to download and look but that would then basically prohibit you from making your own compatible QBasic interpreter. You're better off clean rooming it generally speaking. So it's basically poisoned, you can look but as soon as you do you can't touch anything to do with that area again until you can plausibly say you forgot everything you saw.
I would add that if you make a competing product to Microsoft, and they find out that you looked at their source code, they might use that as justification to open up a lawsuit to check if you copied their code at all. Or if they open a lawsuit to see if you copied code, the fact that you looked might be used against you. IANAL so I could be wrong...
Not necessarily. The reason why torrent seeding is so perilous is because P2P is a privacy nightmare, not because they forgot to say the word "download" in the law. If you want to sue people who are copying your work, BitTorrent makes it very easy to get enough information to demand dox from an ISP. For a "mere downloading" case to even occur, you first need to compromise (legally, of course) the host of some direct-download site, subpoena their logs, subpoena a bunch of ISPs, get dox from that, and then sue that class of users. Usually at that point the copying has stopped anyway, which is enough to get a copyright owner to back off.
If such a suit were to happen, the argument would probably boil down to where the infringement actually occurred. Does the server infringe when it sends copyrighted material (because that's where the copy is made), or does the client infringe when it requests said material (because they asked for an infringing copy)? Courts might accept both arguments and just decide everyone is liable.
Also all of that is for copyright, which (usually) covers published works. Trade secrets, which cover things not ever intended to be published, would just call both sides guilty of "misappropriating" the trade secret.
IANAL. If you pirate without distribution (so torrents are out of the question) and don't circumvent any DRM mechanisms (cracks, keygens) then I don't think there's a legal basis for a lawsuit.
However, most games and software come with some form of DRM which you need to bypass to pirate them, and that's often banned explicitly in copyright legislation. It is under the American DMCA laws and, as far as I know, under European copyright laws, but your mileage may vary.
But yes, as far as I can tell, if someone shares their Good Old Games setup files with you and you download them, you're not breaking the law (though the person sharing the content obviously is).
Regardless of actual legality, you can still expect a lawsuit if your company pirates software and defending against that usually costs more than actually buying the software.
Downloading a file which contains copyrighted data is creating a copy (you now have that sequence of bits in memory or on disk on a device you control) without the right having been granted, no?
According to the laws I know of, distribution is prohibited but receiving an illegal copy isn't. The server is offering an illegal copy and that is the main problem in most piracy cases. Usually the person who offers the copy has more resources and acts like a central part in the redistribution, so they're the easiest and most profitable targets of lawsuits. I'm sure that was considered when the laws were written.
But again, I don't know the specifics of copyright law where you live. Perhaps your jurisdiction considers the person accepting the offer of an illegal copy to be a criminal as well.
Even if you argue that the initial copy of the work in RAM in your network stack was created by the server operator, you're likely to make copies from there: copies in RAM if your network stack is not zero copy; copies to disk if these pages swap; and copies to disk if you save the files.
thanks! people go after uploaders, who both created a copy and distributed the copy, the downloader is receiving yet another copy and by nature of the technology is "reproducing" simply by how filesystems work but it seems like the uploader is the center of attention
thoughts on that? It seems that enforcing this against a downloader would require every single piece of media have an explicit limited license for downloading, which isn't practical right now
The repo would be a trade secret and downloading it could potentially be construed as misappropriation of such. However, I know of no case in which someone decided to sue every individual who downloaded a leak out of curiosity. Usually you'd only bother for large businesses that could actually gain a competitive advantage from that data.
Of course, the flip side of that is that anyone who has ever touched the leak is persona non grata in that industry. So if you're interested in learning how QBasic works, and you ever want to touch anything that interprets scripting code, don't touch those leaks.
FWIW disassembling QBasic (instead of obtaining leaked source) would be an absolute defense against a trade secret claim, but in terms of copyright you now have "access", and need to avoid "substantial similarity" in any source code you write. You aren't strictly-speaking "tainted" (clean-room is not a legal requirement), but if someone actually sued you for copying QBasic you'd better have a good legal argument for why every line of your code does not infringe upon the code you disassembled.
I would not download a code leak for a few reasons.
First, I want to be able to contribute code to open source projects, and I feel like seeing some "leaked" code could somehow taint me in a way that makes this more tricky for me.
Second, my employer expects me to act in a manner that reflects positively upon them. I don't think it's fair business dealings to read stolen code from someone who might be a competitor.
Haha, yes. When that came out I asked the guy at MS if QBasic was on the cards next and I seem to recall he said something about it being problematic and not super likely. I spent 1000x longer with QBasic though (QB4.5 and PDS 7.1 technically.)
ReactOS is not based on Microsoft source code. It's a clean-room reimplementation and anyone who has ever looked at Microsoft code is permanently banned from contributing to that project.
all you have to do is run it through an "AI" and then what comes out is definitely not a derivative work and you can legally use it however you like. Microsoft is very confident in this; just see Copilot.
yeah, if they somehow manage to remove the original data, would copyleft be able to "remember" the idea of the code so that if we try to recreate it, would we be able to without modifying the copilot to prevent this?
It'd be interesting to see (and likely public interest) to have some good analysis done of Microsofts telemetry. eg anything untoward in there, apart from just the forced telemetry itself
There have been a lot of leaks over the years. W2K and NT4 got leaked in 2004, XP in 2020 [1], and that's just the publicly known leaks where stuff ended on the 'chan boards or torrent sites. It's more than likely that there have been more leaks (e.g. from one of the academia or government audit programs) that have never been widely dispersed.
And yet, both WINE and ReactOS have refused to use the leaks; ReactOS doesn't even allow people who have worked legitimately at MS in the past to be developers, simply because even the smell of contamination would expose the projects to enormous legal risks.
The only way I could imagine these leaks being useful is by "parallel construction" aka by comparing the source code with actual Windows binaries and then the WINE/ReactOS code to spot out differences to check, and then have a second person investigate the differences with only the note "check function XYZ with implementation in current Windows binary". But that's a lot of effort for very low reward, not to mention you'd need at least two very skilled experts and the low-hanging fruits having been picked already long ago.
i use mDNS too frequently to find WSL useful. and was kind of surprised to find MS EDGE woukldn't connect to a node service run unf i
WSL. but.... if you don't need networking, it's pretty decent. you can even get an x server and run XTerm and get VT320 emulation.
I remember back in the day, I preferred SP2. It seemed to run notably faster on slower hardware and there were very few instances of software (I don't even remember any) that required a later one to run on 2k.
Just think about that. There was a time where you'd not use an older version of an OS, but an older patch level of an OS and it didn't feel particularly wrong.
Bing powers a host of other engines. Yahoo, DuckDuckGo, most of Qwant, AOL, the web results from Windows Search, Ecosia, and others are all dependent on Bing's search APIs.
If you use a Google alternative, there's a good chance it's just Bing under the hood. So this leak could be a pretty big deal.
It's too bad it's always existing companies that get their source leaked. Just once I'd like to see "Archive of Digital Equipment Corporation source code leaked online". I can dream...
> It's too bad it's always existing companies that get their source leaked. Just once I'd like to see "Archive of Digital Equipment Corporation source code leaked online". I can dream...
The problem is those companies are defunct, so their source repositories may not even exist anymore, let alone be online, e.g.:
> After searching the internet exhaustively, I contacted the Computer History Musuem and they didn’t have any either. They also informed me that apparently SGI destroyed Cray’s old software archives before spinning them off again in the late 90’s.
I know at my employer, there's always pressure to half-ass things that aren't directly connected to some mechanism for making money. We recently migrated our company intranet site from one vendor to another, and the team that was running that project it as a "feature" that they would help us "clean up" by not assisting us migrating anything older than one year. Similarly, source control migrations (of which we've done several) often drop history, since it's usually way easier just to download the latest version and check it into the new system than figure out how to migrate the metadata. IIRC, Microsoft's TFS-VC to git migration tool will only migrate something like 180 days of history.
I hope you're not referring to my comment here [1]. Please note, that was, and remains, purely a speculation/hypothesis; see the thread after that. I have no knowledge, firsthand or otherwise, of them exploiting people's build systems.
It saddens me that Microsoft cannot properly implement zero trust principles or account based access control to their DevOps environment. VDI and VPNs are not secure, no networks are secure!
First, I wouldn't be so harsh on them: statistically, the probability of successful attacks increases with the size of the company, and having lived long enough I consider it a miracle that the likes of Apple and Microsoft had seen so few leaks.
Second, zero trust is a very specific concept that basically refers to not trusting networks traditionally considered as more secure, such as corporate LANs. It is definitely not a panacea, not to mention that no large company, including Google, is able to implement it in full without incurring enormous costs.
Third, whatever you do, it's extremely difficult to protect against an inside job. I'm not suggesting it was a case at all, I'm just saying it's better not to jump to conclusions too hastily.
- based on the perenial patch Tuesday issues I am surprised it did not happen sooner.
- zero trust is a journey. we should accept that networks cannot be secure and instead look to implement principles of ZT away from the network. I like the open source OpenZiti project as a way to put strong identity and zt principles into our apps. Its not a panacea but it does make access and exploit much harder.
- correct, though if using attribute-based access controls we can at least massively limit what an insider could get access to... 37GB of source code across multiple different project at first blush looks like more than what 1 single user should have access to.
> I work in the cyber insurance industry. This is not true.
Really? I mean our small company has never had our codebase breached and released by hackers, while Microsoft and their subsidiaries have had this happen several times. Similarly Twitch, Github, Valve... all have suffered source code breaches similar to the article.
None of the small companies I have ever worked for have had this issue, so it does seem that large tech companies have a higher probability of having their codebase successfully leaked.
We are also talking about Microsoft, which is probably amongst the top companies that are targeted the most by hackers across the world (mainly because of the impact when they are breached, rather than the ease of breach).
I assume when OP talks about the likelihood of a successful breach, they don't mean the success % of a breach, they mean the total number of successful breaches. When I worked at a big company the security team seemed to be putting out small fires all the time with targeted phishing attacks and so many laptops that could have missed an update, virtual machines getting ransomware e.t.c., and now I work for a smaller company and look after their IT as part of my role we have only had 1-2 fairly small issues across the last year.
I thought they tried that already with Windows Vista. I mean... it seemed to at least share a lot with Blockchain since the initial release had more variations than anyone cared to track, was slow, confusing, expensive, and hardly anyone used it despite all of the hype....
So 37GB of source code is clearly a lot, but for someone unenlightened what kind of size are we looking at for the bigger projects? For example, Windows, Office, Exchange.
Interesting. Presumably, LAPSUS had access to Windows source code but still decided to go after comparatively low-fruit like Bing and Cortana instead of the digital gold that is Windows.
Wasn't Windows' source code leaked a number of times already?
I think Bing and Cortana will have some "algorithms" that might be worth a lot more for the right buyer. I mean Google's search algorithm is one of the best kept secrets in the industry.
> I know they've made it available to some universities and large customers also can get access.
And, IIRC, infamously the Chinese government too, because they made it a pre-condition of them purchasing Microsoft licenses that they must have source code visibility.
Well, there is for sure a lot to criticize about the CN government, but this precondition seems to me very natural (the OS is a very natural place for possible backdoors, otherwise...)
Windows source code is fairly widely available, as in government agencies, universities and others already have access. I'm sure this means anyone motivated enough could get it if they really wanted to. Of course even looking at it is problematic if you want to work on open source operating systems later, so I'm not sure why you'd voluntarily choose to do this.
It can be useful even if you're not directly incorporating it into your code. For example you might want stronger guarantees than the API documentation offers (e.g. "this function will only ever return values between this and that in this particular version of Windows"), and being able to read the code to check if your assumptions are valid is very useful. I've worked in function hooking before and ReactOS has been a very useful resource on occasion.
When public documentation for the Hyper-V APIs sucks the way that it does, I'd be willing to risk not being able to write a operating system later if I could figure out a side project now ;)
I'm thinking of the HCS docs (https://docs.microsoft.com/en-us/virtualization/api/hcs/over...). There's very poor documentation of the different types of VMs/Containers you can launch and how to launch them, I'm not sure how much of this is intentional or due to the newer container APIs being too new, but it's super frustrating when you're trying to understand how WSL2 or Windows Sandbox work (or honestly, how to use Windows containers without Docker).
I think you misspelled "dumpster fire". Microsoft is known for going to extreme lengths to maintain backward compatibility, but for Windows in particular this means code that's been hacked on for decades.
(This leak doesn't interest me at all, but if ever the source for QBasic leaked, say.. well I might like to read it without becoming a wanted fugitive ;-))