Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request]: generating SBOMs for container images while building them #274

Closed
developer-guy opened this issue Oct 21, 2021 · 13 comments
Labels
build Improvements to developers build experience with Docker

Comments

@developer-guy
Copy link

I'm a huge fan of the bake command, recently I opened a similar issue to the builds which you can see from here.

Nowadays, SBOM (Software Bill Of Materials) is a trending topic. So, we thought that maybe we can support this SBOM generation as a separate target within the docker-bake.hcl. There are many alternatives to generate SBOMs.

So, we can pick from one of these to generate SBOMs while building container images.

cc: @luhring @nishakm @puerco @Dentrax @imjasonh 🥳🙋🏻‍♂️

@luhring
Copy link

luhring commented Oct 21, 2021

I love this. 😍

Curious what the requirements should be (e.g. support for multiple SBOM formats).

(I'm a maintainer on Syft) Let me know if I can help out in any way!

@nishakm
Copy link

nishakm commented Oct 21, 2021

I have a PoC that does exactly this: https://github.com/vmware-samples/containers-with-sboms. It would be super cool if buildx could integrate SBOM generation every time a filesystem snapshot is created.

@justincormack justincormack transferred this issue from docker/buildx Oct 22, 2021
@justincormack
Copy link
Member

I moved this issue to our roadmap repo to get broader feedback.

@coderpatros
Copy link

I 100% think that the most accurate SBOMs are generated at build time, with close native integration with build systems.

But the entire build system of the assembled software, in this case, can actually be a combination of anything. And parts of it can be completely opaque to Docker tooling when building the container.

If this is implemented it will need to be clearly defined what it can and cannot do.

Don't want to be a party pooper. I'm a big advocate of SBOMs. I just don't want to see another rushed useless implementation.

And to be honest, I'd rather see a useful SBOM for the Docker tooling itself first. There's already some good tools like Syft and Tern for container image SBOM generation.

@hectorj2f
Copy link

But the entire build system of the assembled software, in this case, can actually be a combination of anything. And parts of it can be completely opaque to Docker tooling when building the container.

I totally agree with this ☝🏻 . It will miss dependencies and information.

I'd rather see a useful SBOM for the Docker tooling itself first.

Yes, that is an option, or having both SBOM files a build (source code repo) and runtime (container).

@nebuk89 nebuk89 added the build Improvements to developers build experience with Docker label Oct 22, 2021
@nishakm
Copy link

nishakm commented Oct 25, 2021

But the entire build system of the assembled software, in this case, can actually be a combination of anything. And parts of it can be completely opaque to Docker tooling when building the container.

Very true! One thing Tern does is parse the created_by data to figure out what the intent of the builder was. It's not very good with figuring out full shell scripts though, especially if the shell scripts use build arguments. In this case, I wonder if we can get closer to a more accurate SBOM if some of that data is provided by the user.

@luhring
Copy link

luhring commented Oct 25, 2021

I wonder if we can get closer to a more accurate SBOM if some of that data is provided by the user.

💯 To me, this will be a necessity. As we're mentioning, there are numerous cases where analyzing only the image will give you an incomplete picture of what software is present, even with the best analysis available. If the goal is "completeness" in the image artifact's SBOM, user input of information that was victim to lossy transformations will be critical. We're working on this in Syft — and I'm sure other SBOM tools can/will handle this as well. 👍

@justincormack
Copy link
Member

@coderpatros by "useful SBOM for the Docker tooling itself" do you mean inputs that Docker already knows about, like base layers? I totally agree that there is a difficult mix of things in Docker builds, potentially arbitrary shell scripts and network access, and so we are going to have to use a mix of methods.

People building tools, one question I have is what hooks would be useful to you? If we have to plumb data through (input SBOMs from base, input SBOMs from added software, analysed parts) what kind of hooks would make this easier for your tools?

@coderpatros
Copy link

@justincormack I mean an SBOM that describes, as an example, the Docker CLI.

@tianon
Copy link

tianon commented Nov 1, 2021

This is very interesting 😄

I think anything that happens in docker build by default would make me a little wary (given the potential overhead of deep calculation/inspection of things like packages inside the image), however I think there'd be a ton of value in optionally including more of the Dockerfile/build context data somehow.

Some of the data that's really difficult to get after the fact that Docker itself is uniquely suited to provide are exact image IDs/digests or even locations/names for base images and information about the other build stages that helped create the final image. For example, the specific openjdk tag/digest I used to build my-application.jar is very relevant information for that final my-application.jar artifact.

There are a lot of blurrly lines here depending on how deep a user might want metadata, and the degree of data is probably going to change the "calulcation/information gathering overhead" pretty signifcantly and for users building closed-source solutions, potentially too much information, leaking things they didn't want to, like details about their source code, internal container registry, or worse.

(I guess what I'm trying to get at there is that all aspects of this probably need to be opt-in?)

For my own use cases, I don't think I'd want this to happen during docker build itself unless it was very, very fast (so that it's not in the critical path for build/push).

To illustrate a bit better, a full clean build of all the variants of https://hub.docker.com/_/python already takes several hours per architecture, even on a reasonably fast machine, so having the SBOM calculated out-of-band could be pretty dramatic.

@imjasonh
Copy link

imjasonh commented Nov 1, 2021

Some of the data that's really difficult to get after the fact that Docker itself is uniquely suited to provide are exact image IDs/digests or even locations/names for base images and information about the other build stages that helped create the final image. For example, the specific openjdk tag/digest I used to build my-application.jar is very relevant information for that final my-application.jar artifact.

See #243 for a concrete proposal toward this goal.

@nishakm
Copy link

nishakm commented Nov 3, 2021

People building tools, one question I have is what hooks would be useful to you? If we have to plumb data through (input SBOMs from base, input SBOMs from added software, analysed parts) what kind of hooks would make this easier for your tools?

A few things come to mind for me:

  1. A record of the base image in the "created_by" field or a dedicated field in the config.
  2. A record of build argument values (although this would close the option of passing secrets via the build arguments, I personally believe this is a good thing)
  3. Some option in docker build for a SBOM generation tool to access the mountpoint
  4. Some ability to include or reuse SBOMs created externally
  5. Some ability to record the state of intermediate containers during a multi-stage docker build (this is a pet peeve of mine as this is the bit that most container builders tell me is impossible to do using docker now)

To @imjasonh's point of recording the base image: To make it easier for tools to parse this information, it would be nice to record the base images all the way to scratch. For example, there are multiple base images that have contributed to the final golang image.
Screen Shot 2021-11-03 at 10 28 00 AM

As for the shell script parsing, some environment variable substitution would help greatly. Tern currently tries to do this with some success.

@chris-crone
Copy link
Member

With Docker Desktop 4.7.0 (released yesterday), we have shipped an experimental docker sbom CLI command. The command scans and then outputs the SBOM of a container image using the Syft project. You can find its source code here.

As discussed in our blog post, this is just the first step. The goal is to work with partners and the community to add SBOM generation directly into docker build through BuildKit integrations. We have opened an issue on the BuildKit repo to get help and input.

Please give the docker sbom command a try and give us feedback on it on its repo!

We'd also love anyone who is interested in collaborating on this work to engage on the BuildKit repo or on the Docker Community Slack.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Improvements to developers build experience with Docker
Projects
Status: Shipped! Enjoy!
Development

No branches or pull requests