Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide recommendation to counter xz utils style attack #560

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

david-a-wheeler
Copy link
Contributor

The malicious attack on the xz utils slipped through many defenses because the "source" package included pre-generated malicious code. This meant that review of the source code (e.g., as seen by git) couldn't find the problem.

This proposes a best practices to counter it. The text is longer than I'd like, but it's hard to make it short, and this was a worrying attack so I think it's reasonable to say this.

We'll probably need to renumber this proposal if we also add the proposed text to counter attacks like polyfill.io: #559 ... but I think that's okay!

The malicious attack on the xz utils slipped through many
defenses because the "source" package included pre-generated
malicious code. This meant that review of the source code
(e.g., as seen by git) couldn't find the problem.

This proposes a best practices to counter it. The text is longer
than I'd like, but it's hard to make it short, and this was a
worrying attack so I think it's reasonable to say this.

We'll probably need to renumber this proposal if we also add
the proposed text to counter attacks like polyfill.io:
#559
... but I think that's okay!

Signed-off-by: David A. Wheeler <dwheeler@dwheeler.com>
@david-a-wheeler
Copy link
Contributor Author

@ljharb @ctcpip - If #559 is accepted, I propose number it after #559. These are two different proposals, though, so I thought it'd be easier and faster and consider them in parallel; we can renumber things afterwards :-).

Signed-off-by: David A. Wheeler <dwheeler@dwheeler.com>
Copy link
Contributor

@SecurityCRob SecurityCRob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@drusso-rh
Copy link
Contributor

+1

@@ -33,5 +33,6 @@ Here is a concise guide for all software developers for secure software developm
24. **Continuously improve**. Improve scores, look for tips, & apply as appropriate.
25. **Manage succession**. Have clear governance & work to add active, trustworthy maintainer(s).
26. **Prefer memory-safe languages**. Many vulnerabilities involve memory safety. Where practical, use memory-safe programming languages (most are) and keep memory safety enabled. Otherwise, use mechanisms like extra tools and peer review to reduce risk.
27. **Ensure source packages have only version-controlled source, and rebuild all source to create production package(s)**. E.g., if you use autotools, don't include a generated `configure` file in a source package, but instead ask recipients to build it (e.g., with `autoreconf`). This eliminates a malware-hiding mechanism, as illustrated by an attack on [xz utils](https://access.redhat.com/security/cve/CVE-2024-3094).
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you build it ... from the version controlled repository.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, we want people to get the version-controlled repo, or a copy of the repo's contents. If the tarball has different contents, that's a risk.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"different" is tricky here tho - in any legit package with a build process, the tarball will always have different contents, because it'll have the built output, and usually will omit the raw source.

It's more that the repo, with the build process reapplied, needs to match the tarball's contents, with various heuristics applied (to account for timestamp differences and build machine differences and whatnot).

It's really hard to capture this nuance, though.

@ctcpip
Copy link
Member

ctcpip commented Jul 16, 2024

Thinking about this a bit more. Neither published artifacts nor source repositories are completely safe from supply chain attacks, but are we confident that the surface area for risk is absolutely less with source repos? And for all ecosystems and package registries? If the xz attack had been via the source rather than the artifact would we be having a different conversation?

One potential protection that (at least some) package registries give you is that usually once an artifact is published, it cannot be tampered with.

For example, if I npm install express@4.19.2, I am guaranteed to have the original published artifact -- it cannot be hijacked.

But if I npm install git+ssh://git@github.com:expressjs/express.git#4.19.2, it does not have the same guarantees. It could potentially be anything.

Notably, if the project was compromised, a rogue maintainer could poison the source repository and in this example, version 4.19.2, but they could not poison 4.19.2 in the npm registry.

Therefore, there are scenarios in which relying on the published package is safer.

There is also something to be said for non-malicious situations, where maintainers are correcting tags or simply making mistakes.

For example, I have deleted erroneous tags in projects before. As an example, some projects use a tag format like 1.0.0, 1.1.0, etc and some use v1.0.0, v1.1.0, etc and sometimes you get a stray tag that doesn't match the pattern the project uses. So we just delete the bad tag and add the correct one. But this will break anyone pulling in the bad tag. Also, when I push the new tag, I could mistakenly associate the wrong commit. Maybe this reintroduces a CVE that was patched, making downstream consumers vulnerable, and importantly, this would go undetected by security scanners because it is a version that is marked as fixed or unaffected.

Clearly there is not a perfect answer. The question is, do we have a recommendation that we can confidently say is the better option? I'm not convinced at this point that there is, and I there is some nuance depending on which ecosystems and registries we are talking about.

@ljharb
Copy link
Member

ljharb commented Jul 16, 2024

Indeed in any language with a package repository, git is for development, not consumption.

I continue to think the only true solution here is to use the git repo, replicate the build process, and measure how close the result is to the published, immutable artifact - post publish.

@ctcpip
Copy link
Member

ctcpip commented Jul 16, 2024

flowchart
    0[Install tarball from registry]
    1[Rejoice]
    a{
      Concerned
      about
      compromised
      tarballs?
    }
    b(Build from source)
    c{
      Concerned
      about
      compromised
      source?
    }

    0 --> a
    a --> |YES| b
    b --> c --> |YES| 0
    a --> |NO| 1
    c --> |NO| 1
    
Loading

@ljharb
Copy link
Member

ljharb commented Jul 16, 2024

Indeed, because then you get to rely on the immutable tarball, and you get to verify its contents based on the repo independently without any unpaid work from maintainers in a way that anyone can audit and replicate, including for historical packages whose maintainers are no longer around.

Copy link
Member

@ctcpip ctcpip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using this suggestion, if I want version 1.2.3 of a package, I have to:

  • determine if a 1.2.3 branch or tag exists in the source repo, and get the name of that branch/tag (it may or may not be 1.2.3 exactly)
  • if the branch/tag exists, I must trust that it matches the released artifact's substantive content
    • if I can't trust that it matches, then what do I do?
  • if I can't find a matching branch/tag, or don't trust it, or don't want to use it, then what do I do?

There is also no guarantee that the published artifact at version 1.2.3 matches what the repo says via branch or tag is 1.2.3. The published artifact is not associated with a commit id or anything (and even if it were, that commit id may not exist anymore).

If source repos are clean and well-maintained, then maybe this works. But they are, quite often, far from being clean and well-maintained. Often tags and branches are missing entirely, or they stopped pushing tags seven years ago, or even completely different versions in repo tags vs the package registry. e.g. registry has only 1.0,1.2,1.4, and the repo has only 1.1,1.3. I have actually seen this in production packages with many downloads.

And this gets even murkier when we get into vulnerability management. If I can't properly associate what I'm using with an exact version, then I can't accurately know what CVEs I am impacted by.

There are also other downsides to consuming the source repo directly that are orthogonal to supply chain attack risk.

This recommendation seems to boil down to "don't use package registries/artifactories; don't use published artifacts; use only source code repositories". I think it relies too much on assumptions that we cannot make about source repos. The source repo is the canonical truth, but the published artifact is what is meant to be consumed. For these reasons, and the fact that using published artifacts actually offers some protection that the source repo cannot offer, I find it difficult to recommend this as currently written.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants