The key to this attack is: "The result of these settings is that, by default, any repository contributor can execute code on the self-hosted runner by submitting a malicious PR."
Problem: you need to be a "contributor" to the repo for your PR to trigger workflows without someone approving them first.
So: "We needed to be a contributor to the PyTorch repository to execute workflows, but we didn’t feel like spending time adding features to PyTorch. Instead, we found a typo in a markdown file and submitted a fix."
I really don't like this aspect of GitHub that people who have submitted a typo fix gain additional privileges on the repo by default. That's something GitHub can fix: I think "this user gets to trigger PRs without approval in the future" should be an active button repo administrators need to click, maybe in the PR flow there could be "Approve this run" and "Approve this run and all future runs by user X" buttons.
The vast majority of repos should be able to run CI on pull requests with no privileges at all. GitHub can manage any resource utilization issues on their end.
Is the issue here that a self-hosted runner was needed for some hardware tests?
The problem is that there are fundamentally 2 different kinds of builds, but the current tooling is weak:
* pre-merge builds on PRs. These should not have privileges, but making the distinction between the two cases requires a lot of care.
* official builds of "master" or a feature branch from the main repo. These "need" privileges to upload the resulting artifacts somewhere. Of course, if all it did was wake up some daemon elsewhere, which could download straight from the CI in a verified way based on the CI's notion of the job name, it would be secure without privileges, but most CI systems don't want to preserve huge artifacts, and maintaining the separate daemon is also annoying.
If you're using GHA to publish then you still need a trusted branch to provide secrets to. If you're not publishing using CI, then you can still upload to PyPI manually with an API Token.
> without needing to give microsoft total access to uploading to pypi
I assume you're referring to Trusted Publishers here? It's a per-project configuration using the industry standard of OIDC that you don't have to opt in to, so "total access" is a silly characterization. Also if you're insinuating that MS is going to generate fraudulent OIDC tokens to compromise a PyPI package, then you might want to start weaning yourself off the kool-aid.
Most likely the plan is to make it compulsory for all projects eventually, just like they made 2fa compulsory.
So less secure is considered as MORE secure by pypi :) Which is consistent with the idea that no PGP signature is more secure than signed uploads. Or the idea that a global token in a clear text file is somehow safer than a password that gets typed every time.
Ideally the builds for external PRs should not gain any access to any secret. But it is not practical. For example, you may want to use docker containers. Then you will need to download docker images from somewhere. For example, downloading ubuntu:20.04 from docker hub. That one requires a token, otherwise your request may get throttled. Even accessing most Github services requires a token. I agree with you that these things require a lot of care. In reality most dev teams are not willing to put enough time on securing their build system. That's why supply chain attacks are so common today.
Absolutely. The real difficulty is tests on PR are by definition remote code execution by an untrusted source, so a full risk analysis and hardening needs to be done.
Pytorch did use GH secrets for the valuables and you can see that this wasn't enough, right there in the OP, because the self-hosted runners are still shared
Yup! This is what makes this kind of attack scary and very unique to GitHub Actions. The baseline GITHUB_TOKEN just blows the door open on lateral movement via workflow_dispatch and and repository_dispatch events.
In several of our other operations, not just PyTorch, we leveraged workflow_dispatch to steal a PAT from another workflows. Developers tend to over-provision PATs so often. More often than not we'd end up with a PAT that has all scopes checked and org admin permissions. With that one could clean out all of the secrets from an organization in minutes using automated tools such as https://github.com/praetorian-inc/gato.
Correct. For fork PR workflows on the pull_request trigger the GITHUB_TOKEN has read only permissions, so you can’t do anything with it.
The key thing with a non-ephemeral runner is that (after obtaining persistence) you can grab the GITHUB_TOKEN from a subsequent non-fork PR build or a build on another trigger, which will have write permissions unless restricted by the repository maintainers.
I think 'environments' was meant, where diff GHA environments get diff secrets, and policies dictate who gets to run what actions with what envs.
But that is real work to setup, audit, and maintain. It'd be better if, like phone app capabilities, the default would be no privs, any privs are explicitly granted, and if they aren't being used, the system detects that and asks if you want to remove specific ones.
Ouch! This is why ephemeral runners should be used. Preferably virtual machines. On an infrastructure that can define security group rules to prevent lateral movement.
>The vast majority of repos should be able to run CI on pull requests with no privileges at all
When there are no side effects and no in-container secrets and the hosting is free or reasonably limited to prevent abusers, ideally yes.
Outside that, heck no, that'd be crazy. You're allowing randos to run arbitrary code on your budget. Locking it down until it's reviewed is like step 1, they can validate locally until then.
They are so difficult. I wanted to stop random people to run code on my repository… I don't have any secrets or write access or anything to exploit. Just to avoid burning quota.
The issue is that now the pull requests don't get tested at all. I have to manually, locally, get all the commits, make a branch on the main repository with them, and then the actions run.
That just plain sounds Bad™, yeah. In the buildkite setup I interact with, I can at least just hit the "continue" button on the paused pipeline, regardless of any other GitHub state... which feels like table stakes to me.
My prior is that GitHub is pretty competent in understanding malicious PRs to open source repos, and wouldn’t penalize the repo owner without other evidence of wrongdoing.
GitHub themselves don't seem to provide any mechanism to make runners ephemeral. It looks like all they allow you to do is flag a runner as ephemeral, meaning it will be de-registered once a job is completed - you need to write your own tooling to wipe it yourself (either via starting a whole new runner in a new environment and registering that or wiping the existing runner and re-registering it).
I've just made runs-on [1] for that purpose: self-hosted, ephemeral runners for GitHub Action workflows. Long-running self-hosted runners are simply too risky if your project is public.
(1) disclosure, maintainer
(2) zero implicit trust in this case = no open inbound ports on underlay; need to access via app-specific overlay which requires strong identity, authN, authZ
The default kubernetes implementation owned by github[1] assumes ephemeral runners by default. You can also specify what policies they should have using regular network policies provided by kubernetes. So, if you have a kubernetes cluster, that's the way to go.
They do care about this stuff though - a while ago I pointed out that `pr_closed` was run with the workflows from the PR(^); it was fixed promptly.
^So you could reject a PR that tries to add a bitcoin-mining workflow only to have it start bitcoin-mining on close. Or leaking secrets, releasing nonsense, etc. (I don't recall if non-/contributor status was also a factor. I think it was also ignoring that check.)
This is more subtle, but there is an “author_association”field within Actions event contexts that can be one of:
NONE, CONTRIBUTOR, COLLABORATOR, MEMBER, OWNER
There are some cases where people use checks for that as part of gating for workflows that run on pull_request_target/issue_comment, but might confuse contributor and collaborator (which requires explicitly adding someone to the repository). Ultimately this is a misconfiguration on part of the maintainer but another example where fixing a typo can play a part in an attack.
Recently, there were similar attempts (two) of supply chain attacks on the ClickHouse repository, but: - it didn't do anything because CI does not run without approval; - the user's account magically disappeared from GitHub with all pull requests within a day.
Also, let me recommend our bug bounty program: https://github.com/ClickHouse/ClickHouse/issues/38986 It sounds easy - pick your favorite fuzzer, find a segfault (it should be easy because C++ isn't a memory-safe language), and get your paycheck.
- always pin versions of all packages;
- this includes OS package repositories, Docker repositories, as well as pip, npm, cargo, and others;
- never download anything from the master/main or other branches; specify commit sha;
- ideally, copy all Docker images to our own private registry;
- ideally, calculate hashes after download and compare them with what was before;
- frankly speaking, if CI runs air-gapped, it would be much better...
I don't understand this PR. How is it an "attack"? It seems to just be pinning a package version, was the package compromised, or was this more a "vulnerability"?
If they're pulling from master instead of from a known version, it could be changed to be malicious, and the next time it is fetched, the malicious version would be used instead. It's a vulnerability.
Oh, you'll like this one then. Until 3 months ago GitHub's Runner images was pulling a package directly from Aliyun's CDN. This was executed during image testing (version check). So anyone with the ability to modify Aliyun's CDN in China could have carried out a pretty nasty attack. https://github.com/actions/runner-images/commit/6a9890362738...
Now it's just anyone with write access to Aliyun's repository. :) (p.s. GitHub doesn't consider this a security issue).
Whoa, there's a lot of stuff in there [1] that gets installed straight from vendors, without pinning content checksums to a value known-good to Github.
I get it, they want to have the latest versions instead of depending on how long Ubuntu (or, worse, Debian) package maintainers take to package stuff into their mainline repositories... but creating this attack surface is nuts. Imagine being able to compromise just one of the various small tools they embed, and pivoting from there to all GitHub runners everywhere (e.g. by overwriting /bin/bash or any other popular entrypoint, or even libc itself, with a malware payload).
The balance there is overwhelmingly in favor of usability and having new tools (hence the 1 week deployment cadence). Maybe there is some process they have to go over everything before it makes it into the production pool, but that’s quite an undertaking to perform properly every week.
> on how long Ubuntu (or, worse, Debian) package maintainers take to package stuff into their mainline repositories...
Really a misinformed comment.
For starters Ubuntu is for the most part a snapshot of Debian sid, so except for a few cases it will not have more modern versions.
The python packaging team is really hard working… In most cases stuff that doesn't get updated immediately is because it breaks something or depends on something new that isn't packaged yet.
Please stop demeaning the work of others when you seem to not know that it even happens at all.
Great write-up! There's a few things you can do as either a producer or consumer to thwart this sort of attack:
Producers:
* Self-hosted infrastructure should not be running anonymous code. PRs should be reviewed before code executes on your infrastructure. Potentially should be a GitHub default when using self-hosted runners?
* Permissions for workflows and tokens should be minimal and fine-grained. "permissions: read-all" should be your default when creating a new workflow. Prevents lateral movement via modifying workflow code.
* Self-hosted infrastructure should be isolated and ephemeral, persistence was key for lateral movement with this attack.
Consumers:
* Use a lock file with pinned hashes, either --require-hashes or poetry/pipfile
* Review the diff of the file getting installed, not the GitHub source code. This will get easier when build provenance becomes a feature of PyPI.
* If your organization is large enough, consider mirroring PyPI with approved releases so the manual review effort can be amortized.
* More coming in this space for Python, like third-party attestations about malware, provenance, build reproducibility, etc. Stay tuned! :)
> PRs should be reviewed before code executes on your infrastructure
Very often local tests results can't be trusted specially for projects with architecture level codes like pytorch. Before merging the test results needs to be checked. And it doesn't require just PR review to be safe, it requires review of all the commits as the contributor is making the changes to fix the testcase. Even if we assume that the maintainer will review each of the commit within a day, it could take weeks or months for the contributor to fix the failing testcase with this and maintainer to be looking at the same PR everyday.
> We used our C2 repository to execute the pwd && ls && /home && ip a command on the runner labeled “jenkins-worker-rocm-amd-34”, confirming stable C2 and remote code execution. We also ran sudo -l to confirm we had root access.
While it's not clear was it curated list of commands or just ALL, I assume the latter and that makes me feel no system administrator was involved into that pipelines setup - those guys are quite allergic to giving sudo/root access at all
I’m only going to get into detail if anyone cares, but the actionable, germane thing is to be ready to move to torch-like tensor libraries.
The same design flaws in PyTorch that make it an interest attack vector (which in a very, very small way I had a part in, I also got this wrong) are the same design flaws that basically dictate that’s it’s more like an API than an implementation now: and those flaws are all the ways it’s too big to easily audit or port.
It’s a very good UI to accelerated computing, and I suspect it’s the “last one” for a long time.
I hope the next mainstream implementation will be TinyGrad, I fear it’ll be MLX, and I’d settle for JAX, but it won’t be PyTorch per se.
Also reverse props to the meta bug bounty program manager for not understanding the finding initially. I know it's difficult managing a program but it's not an excuse to brush something like this off.
I would like to drive your attention to a different aspect that doesn't seem to get mentioned in this thread so far: more than 70 different Github workflows.
This is up to your eyeballs in proprietary Microsoft technology, and that is if you are the Colossus from the Attack on Titan kind of tall. And that is from an open-source project...
The article repeats this incantation: the project authors wouldn't have noticed this, the project authors would've never noticed that, we could allow ourselves to be sloppy because the authors aren't likely to oversee the whole thing...
This is just something else here that went wrong. It's the programming oneself so deep into the system you have very little control over, you don't have a good grasp of internal workings of... It shouldn't be surprising that such a system is easily compromised. It's not the specifics of how Github Actions operate that set PyTorch authors up for a failure, it's the choice to rely on proprietary tech, massively, without reservations.
The recommended guidance is either vendoring dependencies or pinning to hashes (pip --require-hashes, poetry.lock, pipfile). When updating your dependencies you should review the actual file getting downloaded.
Compiled binaries are harder, you might consider compiling them from source and comparing the output. This is where build reproducibility comes in to play.
There's a lot more coming in the Python packaging security space that'll make this easier and just safer in general. Stay tuned :)
Well… sort of. C has become a standard with several implementations. It gains supply chain security by being decentralized. Likewise, it has many package managers with different repos for language specific things, and it then has many package managers and repos if we consider UNIX/Linux systems C development environments with dynamic linking and the like.
The issue is, for any given implementation, similar attacks could still happen, and the package repos are still probably vulnerable.
It hasn't… but C developers are much more careful about adding a dependency than js/python/rust/go developers. Mostly because adding a dependency in C is more annoying. In those languages it's just about adding one line.
Of course if you use a distribution and it's a famous library, it's to add a line as well. But then there is the filter of the distribution. Which would work for any language, but most developers vendor everything instead.
Make the corporate proxy use an allow list only. Even then you fall prey to official PyPi hacked packages, but at least then the cryptominers or discord cred stealers can’t phone home.
> These days it's practically a necessity for companies to shell out money to some sort of supply-chain protection software (Sonatype, Socket.dev etc.)
A number of some serious assumptions here. How can you be sure that you’re protected if you spend money on these commercial tools? It’s an arms race after all. There are other ways to protect yourself (pinning dependencies, allow list). A few open source tools are also available to audit code.
It depends on the company. Many companies have bug bounty or vulnerability disclosure programs that explicitly guarantee safe harbor+protections for researchers.
However, not all organizations
are happy to be contacted about security issues. Sometimes doing the right thing can still result in (threats of) legal repercussions.
The bug bounties are usually pretty clear that you aren't allowed to make changes in the production systems. Here they made many changes - including changing the name of a release.
The bug bounties also prefer seeing a working attack instead of theoretical reports. So not sure how they could have tested their attack in this situation without making actual changes.
It depends. Sometimes companies only permit testing in specific test domains, other times they permit it as long as your activity is clearly identifiable (e.g., including a custom header in all request).
It does seem like walking a precarious tight rope.
I know Marcus, the guy they mention that first caught the problem. He had no end of trouble getting Meta to acknowledge the severity of what he'd found, and they just constantly went radio silence on him, in between not really understanding the problem.
I ended up having to reach out to someone senior I knew in the security org there to get them to swoop in and pick up the report, before it got any actual traction (I'd worked with that senior security engineer in a previous job).
One may suspect that they do know, but if you widen the scope of bug bounty programmes to encompass open source project supply chain then your programme immediately turns into a dollar piñata.
For a long time Apple didn't have a bug bounty programme at all. This wasn't because they didn't care about security. It's because their own internal audits were generating enough reports to saturate the available capacity for bug fixing, so paying for more reports would have just duplicated work. Generally this is the pattern at big tech firms: you want to turn your internal security teams to a problem for a while before starting to pay out for a new class of bugs. But of course it's hard to descope a problem from bug bounties, it looks very bad.
I know that it is zeitgeist exploiting to say this, but seeing Boeing listed and not Airbus really says something to me.
Lockheed being listed makes me wonder if the FBI/CIA really will (further) step up on cybercrime, because you now have potential national security implications in a core supplier to multiple military branches.
That's not entirely true, Airbus is a participant in most European military aircraft projects. They participated in Eurofighter for example and are part of the FCAS program.
It's true to the extent that the US does a lot more and broader military procurement in general, so Boeing gets a smaller piece of a much bigger pie. Wheras Airbus is getting a piece of most European projects as a member of one consortium or another, it's just a smaller pie.
Are they? Airbus has its hands in quite a few major military programs (Eurofighter, A-400M, Tigre, Super-Puma, ...), as well as in spatial programs, especially satellite intelligence.
shameless plug: there are ways to run custom sized runner without risks associated with self hosted runners. fasterci.com ephemeral runners is one of them.
Is 5k an appropriate amount for such a finding? Sounds incredibly cheap for such a large organization. How much would something like this be worth on the black market?
No, that is in general the issue with security bounties. They attract mainly people who have enough time for trial and error and/or prior domain expertise and/or extremely smart in specific software. Nowadays cybersecurity is a vast field and it is not the same to be a white hat hacker specialized in Google Chrome issues than one in iOS. Not saying it cannot be the same person but the amount of time required to catch issues is long.
I think supply chain attacks are not being taken very seriously. Think that people working, for example, in Python or JavaScript use pip or npm daily no matter if they work for a nuclear agency or your uncle's bar.
Bug bounties do not compete with the black market. Also on the business side, they are not as efficient as just paying an internal QA or security team. Katie Mousouris, who set up Microsoft's original bug bounty program has gone into a lot of detail on this. E.g. https://www.zdnet.com/article/relying-on-bug-bounties-not-ap...
This question comes of up frequently with these and it's premised on the hypothetical value of the bug on 'the black market'. The vast majority of such reported vulnerabilities have a 'black market' value of roughly zero, though, including this one. This doesn't say anything about the quality of the research, just that it's pretty hard to get monetary or other value out of most vulnerabilities.
It’s quite a bit more nuanced than that. Businesses only want to pay because it costs less than the damage done to the brand and/or lawsuits from users/data controllers. They don’t want to pay more than that. Researchers need money and are able to sell the fruits of their research to whomever they want. Generally, good-natured people will especially see if the bounty is worth it. It’s clean money, so it has additional value vs. selling it on the black market.
So, as you can hopefully see, it is a balancing act between all parties.
No, I don't think that holds much explanatory power - the vast majority of vulns have not only zero black market value, they also carry effectively zero brand or legal liability risk. This is also the case for this vuln.
Generally, getting root on internal infrastructure is just a step away from doing whatever you want. Even if it is just waiting for someone to ssh in with -A set so they can steal your ash keys.
A good rule of thumb is that if an exploit doesn't drop pin-compatibly into a pre-existing business model that has repeatedly used similar exploits in the past, it's worth nothing in a "commoditized" vulnerability market --- the kind HN tends to think of in these stories ("Zerodium" being the most common example). You can theoretically find someone who will listen to your story and give you money, but at that point you aren't so much selling a vulnerability as helping plan a heist.
I could be wrong about this, but I've been loud about it around people who do a lot of this stuff and none of them have dunked on me in public. :)
Problem: you need to be a "contributor" to the repo for your PR to trigger workflows without someone approving them first.
So: "We needed to be a contributor to the PyTorch repository to execute workflows, but we didn’t feel like spending time adding features to PyTorch. Instead, we found a typo in a markdown file and submitted a fix."
I really don't like this aspect of GitHub that people who have submitted a typo fix gain additional privileges on the repo by default. That's something GitHub can fix: I think "this user gets to trigger PRs without approval in the future" should be an active button repo administrators need to click, maybe in the PR flow there could be "Approve this run" and "Approve this run and all future runs by user X" buttons.