Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Lapsus$ hackers leak 37GB of Microsoft's alleged source code (bleepingcomputer.com)
243 points by chillax on March 22, 2022 | hide | past | favorite | 119 comments


Because I had a similar thought when Twitch's code "leaked" and I never got round to asking: is it illegal under US law to download such a leak for private inspection? That is, illegal in a way beyond copyright violation, such as being party to an act of espionage.

(This leak doesn't interest me at all, but if ever the source for QBasic leaked, say.. well I might like to read it without becoming a wanted fugitive ;-))


IANAL, the risk is generally not in the downloading per-se but the distribution and use. The copyright owner can sue for use or distribution of the IP. So you might be able to download and look but that would then basically prohibit you from making your own compatible QBasic interpreter. You're better off clean rooming it generally speaking. So it's basically poisoned, you can look but as soon as you do you can't touch anything to do with that area again until you can plausibly say you forgot everything you saw.


I would add that if you make a competing product to Microsoft, and they find out that you looked at their source code, they might use that as justification to open up a lawsuit to check if you copied their code at all. Or if they open a lawsuit to see if you copied code, the fact that you looked might be used against you. IANAL so I could be wrong...


So this technically makes piracy legal, as losing as I'm not distributing the material, (or seeding it's torrent).


Not necessarily. The reason why torrent seeding is so perilous is because P2P is a privacy nightmare, not because they forgot to say the word "download" in the law. If you want to sue people who are copying your work, BitTorrent makes it very easy to get enough information to demand dox from an ISP. For a "mere downloading" case to even occur, you first need to compromise (legally, of course) the host of some direct-download site, subpoena their logs, subpoena a bunch of ISPs, get dox from that, and then sue that class of users. Usually at that point the copying has stopped anyway, which is enough to get a copyright owner to back off.

If such a suit were to happen, the argument would probably boil down to where the infringement actually occurred. Does the server infringe when it sends copyrighted material (because that's where the copy is made), or does the client infringe when it requests said material (because they asked for an infringing copy)? Courts might accept both arguments and just decide everyone is liable.

Also all of that is for copyright, which (usually) covers published works. Trade secrets, which cover things not ever intended to be published, would just call both sides guilty of "misappropriating" the trade secret.


IANAL. If you pirate without distribution (so torrents are out of the question) and don't circumvent any DRM mechanisms (cracks, keygens) then I don't think there's a legal basis for a lawsuit.

However, most games and software come with some form of DRM which you need to bypass to pirate them, and that's often banned explicitly in copyright legislation. It is under the American DMCA laws and, as far as I know, under European copyright laws, but your mileage may vary.

But yes, as far as I can tell, if someone shares their Good Old Games setup files with you and you download them, you're not breaking the law (though the person sharing the content obviously is).

Regardless of actual legality, you can still expect a lawsuit if your company pirates software and defending against that usually costs more than actually buying the software.


Downloading a file which contains copyrighted data is creating a copy (you now have that sequence of bits in memory or on disk on a device you control) without the right having been granted, no?


According to the laws I know of, distribution is prohibited but receiving an illegal copy isn't. The server is offering an illegal copy and that is the main problem in most piracy cases. Usually the person who offers the copy has more resources and acts like a central part in the redistribution, so they're the easiest and most profitable targets of lawsuits. I'm sure that was considered when the laws were written.

But again, I don't know the specifics of copyright law where you live. Perhaps your jurisdiction considers the person accepting the offer of an illegal copy to be a criminal as well.


The person running the server is the one who is creating the copy and doing the distribution, not you the user.


Even if you argue that the initial copy of the work in RAM in your network stack was created by the server operator, you're likely to make copies from there: copies in RAM if your network stack is not zero copy; copies to disk if these pages swap; and copies to disk if you save the files.

See e.g. https://scholarship.law.edu/cgi/viewcontent.cgi?article=1602...


Dubious, when the user is the one initiating the copy (by requesting the download) and the one benefiting from the copy.


Copyright infringement requires sharing the copy without a license to do so, not making the copy


The right to make copies (reproduce a work) and distribute copies (sharing) are both separately protected by copyright law [1].

[1] https://www.bitlaw.com/copyright/scope.html


thanks! people go after uploaders, who both created a copy and distributed the copy, the downloader is receiving yet another copy and by nature of the technology is "reproducing" simply by how filesystems work but it seems like the uploader is the center of attention

thoughts on that? It seems that enforcing this against a downloader would require every single piece of media have an explicit limited license for downloading, which isn't practical right now


IANAL, but no. You can still get sued.


You can be sued for anything. I could sue you for farting in an elevator.

You have to actually show damages, and nobody is going to care about personal research into obsolete software.


except oracle.


This is the case in Canada, at least.


Switzerland, too.


The repo would be a trade secret and downloading it could potentially be construed as misappropriation of such. However, I know of no case in which someone decided to sue every individual who downloaded a leak out of curiosity. Usually you'd only bother for large businesses that could actually gain a competitive advantage from that data.

Of course, the flip side of that is that anyone who has ever touched the leak is persona non grata in that industry. So if you're interested in learning how QBasic works, and you ever want to touch anything that interprets scripting code, don't touch those leaks.

FWIW disassembling QBasic (instead of obtaining leaked source) would be an absolute defense against a trade secret claim, but in terms of copyright you now have "access", and need to avoid "substantial similarity" in any source code you write. You aren't strictly-speaking "tainted" (clean-room is not a legal requirement), but if someone actually sued you for copying QBasic you'd better have a good legal argument for why every line of your code does not infringe upon the code you disassembled.


but if someone actually sued you for copying QBasic

Not to take away from your useful points, but that would be quite the case in 2022! :-)


I would not download a code leak for a few reasons.

First, I want to be able to contribute code to open source projects, and I feel like seeing some "leaked" code could somehow taint me in a way that makes this more tricky for me.

Second, my employer expects me to act in a manner that reflects positively upon them. I don't think it's fair business dealings to read stolen code from someone who might be a competitor.


> but if ever the source for QBasic leaked

QBasic? - Why that modern thing. Just use the proper BASIC: https://github.com/microsoft/GW-BASIC ;)


Haha, yes. When that came out I asked the guy at MS if QBasic was on the cards next and I seem to recall he said something about it being problematic and not super likely. I spent 1000x longer with QBasic though (QB4.5 and PDS 7.1 technically.)


If you want anything clean room related in your career, I wouldn't read any of it.


If it was like the mainsoft leak, I learned a hell of a lot about what not to do reading that.


I don't want their compiled code let alone their source code!


I could imagine a LibreWindows where all the bloatware, advertising, microsoft backdoors and so on are completely removed from the source.


Russia is funding ReactOS.


ReactOS is not based on Microsoft source code. It's a clean-room reimplementation and anyone who has ever looked at Microsoft code is permanently banned from contributing to that project.


How do they know that their contributors never took a look at Microsoft source code?


This was a major factor in the Oracle v. Google case, where some of the argument boiled down to "well, how else would you do that?".

ReactOS audited their code and showed that none of it was Microsoft's.


Honor based system


With what money? :P


Microsoft has been a supporter of open source lately anyway.


no sane org would touch this with a 10 foot pole


all you have to do is run it through an "AI" and then what comes out is definitely not a derivative work and you can legally use it however you like. Microsoft is very confident in this; just see Copilot.


And it would even be open-source copyleft [0] because the "AI" can't claim it's copyright either! Check mate patent trolls

[0] https://www.theverge.com/2022/2/21/22944335/us-copyright-off...


yeah, if they somehow manage to remove the original data, would copyleft be able to "remember" the idea of the code so that if we try to recreate it, would we be able to without modifying the copilot to prevent this?


It'd be interesting to see (and likely public interest) to have some good analysis done of Microsofts telemetry. eg anything untoward in there, apart from just the forced telemetry itself


But with some clean room due diligence this could be a boon to WINE.


There have been a lot of leaks over the years. W2K and NT4 got leaked in 2004, XP in 2020 [1], and that's just the publicly known leaks where stuff ended on the 'chan boards or torrent sites. It's more than likely that there have been more leaks (e.g. from one of the academia or government audit programs) that have never been widely dispersed.

And yet, both WINE and ReactOS have refused to use the leaks; ReactOS doesn't even allow people who have worked legitimately at MS in the past to be developers, simply because even the smell of contamination would expose the projects to enormous legal risks.

The only way I could imagine these leaks being useful is by "parallel construction" aka by comparing the source code with actual Windows binaries and then the WINE/ReactOS code to spot out differences to check, and then have a second person investigate the differences with only the note "check function XYZ with implementation in current Windows binary". But that's a lot of effort for very low reward, not to mention you'd need at least two very skilled experts and the low-hanging fruits having been picked already long ago.

[1] https://borncity.com/win/2020/10/01/entwickler-compiliert-wi...

[2] https://reactos.org/forum/viewtopic.php?t=20189


and does not matter if they like it or not /s


If any of the hackers are here, could you run this on the Windows repo please?

  git reset --hard windows2000-sp4
/s


Make it win 7 and I'd agree.

(Actually I don't care anymore. Modern linux beats any version of windows in all my use cases)


> Modern linux beats any version of windows in all my use cases

Are you aware that hackers have been leaking the source code of Linux for years?


You can find so many modded versions though...


> Modern linux beats any version of windows in all my use cases

It's the other way around for me. Windows Subsystem for Linux ftw.

I have used Linux as the primary Desktop OS for most of my life but I hope I will never have to go back ever again.


i use mDNS too frequently to find WSL useful. and was kind of surprised to find MS EDGE woukldn't connect to a node service run unf i WSL. but.... if you don't need networking, it's pretty decent. you can even get an x server and run XTerm and get VT320 emulation.


What you say about networking is not true. Networking works seamlessly between native windows applications and linux applications running on WSL.


Make it vista and we're golden


Not sure if Vista was the low point or Windows ME. Probably Vista.


Wrong verb tense there. The low point is windows 11.


95 and 98 were horrible. Hearing the error chime still gives me physical reactions.


Agree. UAC is pretty basic feature.


Lol ok


I remember back in the day, I preferred SP2. It seemed to run notably faster on slower hardware and there were very few instances of software (I don't even remember any) that required a later one to run on 2k.

Just think about that. There was a time where you'd not use an older version of an OS, but an older patch level of an OS and it didn't feel particularly wrong.


> There was a time where you'd not use an older version of an OS, but an older patch level of an OS and it didn't feel particularly wrong.

Unless you had a network of these, like a school, and a couple enterprising pranksters with Metasploit.


Granted, from now on all your electronic devices will run Windows 2000 IA-64.


>all your electronic devices

Cool. Having the only Itanium powered oven/fridge/washer gives him insane nerd street cred.


I can understand the oven, but surely being "Itanium-powered" would be counterproductive for a fridge?


Your fridge yes, your oven no.


Are you sure you don't just want vms instead?


i miss VMS. it's sort of like WinNT but with all the stuff that doesn't work stripped out.


windowsnt4-sp6a


Bing and Cortana, the two most important and best working Microsoft products.


Bing powers a host of other engines. Yahoo, DuckDuckGo, most of Qwant, AOL, the web results from Windows Search, Ecosia, and others are all dependent on Bing's search APIs.

If you use a Google alternative, there's a good chance it's just Bing under the hood. So this leak could be a pretty big deal.


There is a reason why they hide that code ^^


My guess is that they did not leak Windows and Office source code because they want to mine it for 0days themselves.

Time to really lock things down folks.


Or simply because (as the screenshot show) they got only access to Azure for an account that doesn't have access to the OS department of Microsoft


It's too bad it's always existing companies that get their source leaked. Just once I'd like to see "Archive of Digital Equipment Corporation source code leaked online". I can dream...


> It's too bad it's always existing companies that get their source leaked. Just once I'd like to see "Archive of Digital Equipment Corporation source code leaked online". I can dream...

The problem is those companies are defunct, so their source repositories may not even exist anymore, let alone be online, e.g.:

http://www.chrisfenton.com/homebrew-cray-1a/

> After searching the internet exhaustively, I contacted the Computer History Musuem and they didn’t have any either. They also informed me that apparently SGI destroyed Cray’s old software archives before spinning them off again in the late 90’s.

I know at my employer, there's always pressure to half-ass things that aren't directly connected to some mechanism for making money. We recently migrated our company intranet site from one vendor to another, and the team that was running that project it as a "feature" that they would help us "clean up" by not assisting us migrating anything older than one year. Similarly, source control migrations (of which we've done several) often drop history, since it's usually way easier just to download the latest version and check it into the new system than figure out how to migrate the metadata. IIRC, Microsoft's TFS-VC to git migration tool will only migrate something like 180 days of history.


Sooo, are all of these leaks directly tied to the Okta access they had? That’s an absolute bummer for business if so. Sheesh.


no, the leaks are separate


Oh no, what have I done!

I hope you're not referring to my comment here [1]. Please note, that was, and remains, purely a speculation/hypothesis; see the thread after that. I have no knowledge, firsthand or otherwise, of them exploiting people's build systems.

[1]: https://news.ycombinator.com/item?id=30763269


> I have no knowledge, firsthand or otherwise

Sus !


It saddens me that Microsoft cannot properly implement zero trust principles or account based access control to their DevOps environment. VDI and VPNs are not secure, no networks are secure!


First, I wouldn't be so harsh on them: statistically, the probability of successful attacks increases with the size of the company, and having lived long enough I consider it a miracle that the likes of Apple and Microsoft had seen so few leaks.

Second, zero trust is a very specific concept that basically refers to not trusting networks traditionally considered as more secure, such as corporate LANs. It is definitely not a panacea, not to mention that no large company, including Google, is able to implement it in full without incurring enormous costs.

Third, whatever you do, it's extremely difficult to protect against an inside job. I'm not suggesting it was a case at all, I'm just saying it's better not to jump to conclusions too hastily.


Agreed, with a couple of caveats:

- based on the perenial patch Tuesday issues I am surprised it did not happen sooner.

- zero trust is a journey. we should accept that networks cannot be secure and instead look to implement principles of ZT away from the network. I like the open source OpenZiti project as a way to put strong identity and zt principles into our apps. Its not a panacea but it does make access and exploit much harder.

- correct, though if using attribute-based access controls we can at least massively limit what an insider could get access to... 37GB of source code across multiple different project at first blush looks like more than what 1 single user should have access to.


Or maybe even Microsoft has the odd one or two guys who like to check in their nuget packages dir.


> statistically, the probability of successful attacks increases with the size of the company

I work in the cyber insurance industry. This is not true.


> I work in the cyber insurance industry. This is not true.

Really? I mean our small company has never had our codebase breached and released by hackers, while Microsoft and their subsidiaries have had this happen several times. Similarly Twitch, Github, Valve... all have suffered source code breaches similar to the article.

None of the small companies I have ever worked for have had this issue, so it does seem that large tech companies have a higher probability of having their codebase successfully leaked.

We are also talking about Microsoft, which is probably amongst the top companies that are targeted the most by hackers across the world (mainly because of the impact when they are breached, rather than the ease of breach).

I assume when OP talks about the likelihood of a successful breach, they don't mean the success % of a breach, they mean the total number of successful breaches. When I worked at a big company the security team seemed to be putting out small fires all the time with targeted phishing attacks and so many laptops that could have missed an update, virtual machines getting ransomware e.t.c., and now I work for a smaller company and look after their IT as part of my role we have only had 1-2 fairly small issues across the last year.


These are mostly social engineering attacks. As employees headcount increases, so does attack vectors.



So you're saying Microsoft should have put their source code on a blockchain?


I thought they tried that already with Windows Vista. I mean... it seemed to at least share a lot with Blockchain since the initial release had more variations than anyone cared to track, was slow, confusing, expensive, and hardly anyone used it despite all of the hype....


I think you mean their DevOops environment.


> alleged source code

I may just be line noise, but compiles and runs just as well.


So, perl?


fave Perl dis; "an explosion happened at the ASCII factory"


So 37GB of source code is clearly a lot, but for someone unenlightened what kind of size are we looking at for the bigger projects? For example, Windows, Office, Exchange.



Interesting. Presumably, LAPSUS had access to Windows source code but still decided to go after comparatively low-fruit like Bing and Cortana instead of the digital gold that is Windows.


Wasn't Windows' source code leaked a number of times already?

I think Bing and Cortana will have some "algorithms" that might be worth a lot more for the right buyer. I mean Google's search algorithm is one of the best kept secrets in the industry.


Windows source isn't all that hard to see. I know they've made it available to some universities and large customers also can get access.

https://www.microsoft.com/en-us/sharedsource/enterprise-sour...


> I know they've made it available to some universities and large customers also can get access.

And, IIRC, infamously the Chinese government too, because they made it a pre-condition of them purchasing Microsoft licenses that they must have source code visibility.


"Infamously" ?

Well, there is for sure a lot to criticize about the CN government, but this precondition seems to me very natural (the OS is a very natural place for possible backdoors, otherwise...)


No government should buy closed source anything for obvious reasons.


> because they made it a pre-condition of them purchasing Microsoft licenses that they must have source code visibility.

Even the communist party in China is more up to date then my own government


10,000 enterprise windows licenses minimum... that's a big customer


Windows source code is fairly widely available, as in government agencies, universities and others already have access. I'm sure this means anyone motivated enough could get it if they really wanted to. Of course even looking at it is problematic if you want to work on open source operating systems later, so I'm not sure why you'd voluntarily choose to do this.


It can be useful even if you're not directly incorporating it into your code. For example you might want stronger guarantees than the API documentation offers (e.g. "this function will only ever return values between this and that in this particular version of Windows"), and being able to read the code to check if your assumptions are valid is very useful. I've worked in function hooking before and ReactOS has been a very useful resource on occasion.


When public documentation for the Hyper-V APIs sucks the way that it does, I'd be willing to risk not being able to write a operating system later if I could figure out a side project now ;)


Full disclosure, I work on Hyper-V. Are you thinking of these docs - https://docs.microsoft.com/en-us/virtualization/api/hypervis... - or something else?


I'm thinking of the HCS docs (https://docs.microsoft.com/en-us/virtualization/api/hcs/over...). There's very poor documentation of the different types of VMs/Containers you can launch and how to launch them, I'm not sure how much of this is intentional or due to the newer container APIs being too new, but it's super frustrating when you're trying to understand how WSL2 or Windows Sandbox work (or honestly, how to use Windows containers without Docker).


I think you misspelled "dumpster fire". Microsoft is known for going to extreme lengths to maintain backward compatibility, but for Windows in particular this means code that's been hacked on for decades.


Whilst it might be treasure for hacker, but for "learner" could I say any Windows are horrible precisely due to this baggage.


MS have multiple devops orgs separated by location i believe. Judging by the screenshot where it says MSASG, I believe that's the China/Asia one.

It does have other orgs in the screenshot, but all the leaks seem to be ASG related.


Windows source code must be huuuuge...


Well it depends if it's a snapshot or clone or the repositories themselves. It could be a whole load of nothing that changes a lot.


It's crazy the leaking group is offering people who work at likely targets money.

Like, what's the exit strategy there? They use your credentials, leak stuff, and you take the fall?


Anything interesting in this leak?


Does anyone actually use Cortana and Bing?


Bing's pretty decent when you want to search the web for your query without Google attempting know what you're looking for better than you do.


A lot of people use DDG which mainly uses Bing.


True.


It would be fun if a backdoor was found in there.


cool. we can fix it now.


Anything badass...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: