-
Notifications
You must be signed in to change notification settings - Fork 159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal to add oci
as purl type
#123
Conversation
ca4117e
to
2ec9f94
Compare
The key thing I keep seeing come up is an assumption that there is a single public endpoint for certain content. We wrote this cross-cloud, vendor-neutral post (Consuming Public Content) last year when Docker was introducing throttles to account for the costs of running a public service. The reality is content is being widely redistributed. ECR Public, GitHub Registry and others. The same nginx image can be retrieved from multiple public endpionts. If it's pulled into a private registry, which should be "the truth"? This is no different than buying the same part from different distributors. The part is still the same part, regardless of where it was acquired. This gets more complicated for private content. A company may not want to disclose where the content was initially created, as that may be some security disclosure of an attack vector. What I'm hoping is achieved is the identify of the entity that created the artifact is addressed in a signature. Did good-co create this artifact abc123, vs. bad-co that created def456. What I'm hoping we do is adopt a model of uniquely identifying an artifact, completely independent from where it may have been acquired. |
PURL-TYPES.rst
Outdated
@@ -221,6 +207,34 @@ nuget | |||
|
|||
pkg:nuget/EnterpriseLibrary.Common@6.0.1304 | |||
|
|||
oci-artifact |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If container images are in scope for this, then I would consider renaming this purl type to just oci
. Not all images conform to OCI Artifacts; e.g. multi-platform images don't fit into that model currently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above where we agreed this would be oci-artifact as images are one of many types of artifacts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
naming is as hard as scoping :-) I agree with @jonjohnsonjr if the scope was docker vs oci images we should name it oci or oci-image.
I think longer term though we should probably just scope this as all oci-artifact types stored in one of these oci compatible registries. I think that's best for the OCI consumer? borrowing the definitions from the distribution spec:
- **Registry**: a service that handles the required APIs defined in this specification
- **Client**: a tool that communicates with Registries
- **Push**: the act of uploading Blobs and Manifests to a Registry
- **Pull**: the act of downloading Blobs and Manifests from a Registry
- **Blob**: the binary form of content that is stored by a Registry, addressable by a Digest
- **Manifest**: a JSON document which defines an Artifact. Manifests are defined under the OCI Image Spec <sup>[apdx-2](#appendix)</sup>
- **Config**: a blob referenced in the Manifest which contains Artifact metadata. Config is defined under the OCI Image Spec <sup>[apdx-4](#appendix)</sup>
- **Artifact**: one conceptual piece of content stored as Blobs with an accompanying Manifest containing a Config
- **Digest**: a unique identifier created from a cryptographic hash of a Blob's content. Digests are defined under the OCI Image Spec <sup>[apdx-3](#appendix)</sup>
- **Tag**: a custom, human-readable Manifest identifier
in that context(scope) .. oci-artifacts would mean conceptual pieces of content stored as OCI Blobs with an accompanying OCI Manifest containing a OCI Config
This has been true for non-cloud native packages for over a decade. "The truth" is resolved when the package is retrieved from somewhere and wherever it was retrieved from forms the basis of many provenance use cases.
This has nothing to do with oci-artifact and is true for any purl type. If an org uses an internal repo and they do not want to disclose it, they simply omit (or redact)
Then what you're looking for is no longer PURL, rather Package ID or something similar. Again, completely outside the scope of the oci-artifact ticket we're discussing. If you want to separate identify from location it needs to be done in the core spec, not in the purl types. However, I will caution that separating these two concepts will likely break compatibility for the millions or users that have already adopted PURL. |
I don't think what @stevespringett and @SteveLasker are saying is contradictory, if we start with the premise that a purl contains both an identity and a location, then as a consumer of an With a second assumption that we believe it's useful to be able to refer to
We can also emphasize that you can compare the identity of an Does this seem a reasonable way forward? |
Thanks folks. It is complicated. I'm generally in agreement with Williams's suggestion. I don't think we should default to hub.docker.io, as this could imply a squatting attack, where someone could place a matching named artifact in docker hub. |
Because the name is a content hash they couldn't just place any artifact there, they'd have to place the same artifact there. Or did I miss something? If as the author of the purl you know it's not on I know this isn't perfect, just trying to come up with a reasonable compromise in the meantime. |
I'm going to use As long as it could be nothing, than I think we're fine. |
If the image was a malicious dupe and content was changed, then the digest would change, right? I like @iamwillbar's suggestion and think it's a healthy compromise for what we want to achieve right now :) Thanks to you both for working through this. |
@iamwillbar Thanks for helping us understand the nuances of the spec! It looks like we have consensus on using As for the package type, if we are OK being vague about the types of artifacts we are distributing (it looks like we are considering there is a |
Thanks for getting this started @rnjudge! I think this will help with some of the awkwardness we found when adopting PURL for OCI images in TektonCD and In-Toto over here: https://github.com/tektoncd/chains/blob/7d1f010eaf55accb83ce712a11bbe83921cdcc30/pkg/chains/formats/intotoite6/intotoite6.go#L250 A couple high-level questions:
I don't feel too strongly about the overall name ( Going forward, I think the new |
+1 here, I think this would lead to the least amount of confusion long term. It also matches the other examples well like you described. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rnjudge @SteveLasker @nishakm Thank you this PR!
For a start, can you remove the removal of the docker type (or its deprecation) from this PR?
If you feel strongly about removing docker, then a separate PR would be best; but the probability of this removal to be merged is tangentially approaching zero for several reasons highlighted in the discussion here, plus one below.
The same way there is a gopher
URL scheme registered at IANA that is still seldom used, we would likely never remove any types from the spec just because a type may be outdated. We could mark some as deprecated though, but this would still be another discussion of its own. More likely, less popular types will fall out of fashion and will stop being used from natural attrition.
As a small nitpicking purl
is written lowercase throughout the spec; for consistency, it would be best to use purl
rather than the arguably prettier pURL
.
About the type name , I feel that oci
is a more concise name that could save a lot of typing.
Let me review all the discussions in details over the week-end and provide more feedback!
There's a few reasons we want to include this. First off, it's an optional qualifier -- so folks who feel that it's redundant don't have to include it and parsing tools who want to ignore it, can easily do that. Second, it's helpful for the case of using purl in the SPDX spec or any other SBOM using purl that is designed to be more human readable. Specifically, when someone looking at or parsing an SBOM may not use a client tool to verify the artifact type, it could be a helpful hint for them to be able to see the type declared. Of course, it's possible to put a different artifact type that doesn't match the object/API but that's true for any of the fields -- you can put anything you want as long as it's in the right format it's going to technically be a valid purl. It's up to the consumer of the purl to verify its correctness.
Can you elaborate on your concerns a little bit about how it would make it harder?
+1 about the confusion which is why it was proposed to remove the docker entry. However, compatability concerns are understood so we will leave it for now (see below comment)
I don't know about a "deprecating" field but the plan is to make a note in the docker entry that points/encourages users to use oci artifact until we come up with a more formal way to deprecate docker. I'm waiting for some of the discussion topics to resolve before updating the PR so we don't lose them once I push the changes. |
Yes, that is the plan :) I'm just waiting for some of the discussions to clear up before updating the PR so we don't lose the ones happening on the file reviews. Note that I am hoping to add a note under the
Deprecating or natural attrition is fine. I think a note in the docker section would speed up this natural attrition. Apologies for the oversight in removing docker from my first PR. We had originally thought docker was only added in 2021 and wouldn't affect compatibility too much.
Thanks for the clarification, Philippe. I will make the necessary changes to lowercase
This feels like a weak argument when weighed against the justifications for keeping "oci-artifact" :) Most of these purls will be generated automatically so no typing is necessary. Even if it had to be typed out, I can type
|
I'll disagree here as well and agree with @pombredanne. Again, I'm not terribly opinionated on the name but |
Here's some pseudocode that I think would need to be written to correctly convert some example container image references to this PURL spec:
vs something like:
|
This raises the philosophical question for the PR authors of what is the intent of this purl, is it to refer to an OCI Artifact or to anything stored in an OCI registry? If the former I'd recommend sticking with |
I want to preface this and say I'm not trying to be snarky here.. but... I'm not sure an extra 3 lines of code can qualify as "making it harder" to create this purl -- it's straightforward to assign a tag and version with the current proposal. Additionally, we're not proposing an if/else for tag and version (digest) -- they can both (and I would argue, should) be included if available. Second, I have an issue with choosing tag or digest for the version. There's value in both of them being there, especially the digest which is the only thing that's actually unique about the artifact you are trying to identify. Using only the tag does not help "reliably identify" the software as purl is intended. |
@iamwillbar It's intended to treat all content in a registry with one purl. If the confusion around @dlorenc is there another term that could use instead of |
The OCI artifact (I am using the term loosely here) digest is the best way to reliably identify this type of artifact. The tag is mutable. However, the digest is not human readable, hence we allow for |
I agree strongly! I'm just worried about doing this in the tooling I work on, where you typically only have one. It's technically possible to include both in the string when pulling an image: with something like
I can't think of one :( |
Would oci-content work? As in, content in a registry. Prefer not to invent a new term not used by the community unless needed. |
Works for me! Either that |
Understand the concern here. However, instead of adopting this proposal to use what's convenient for some tooling , we should include the digest/version as a separate entry from the tag instead of the digest/version and tag being either/or. Not to mention, it's best practice when referring to container images to refer to them using their immutable digest and we should strive for this instead of encouraging folks to default to only using the tag. Furthermore, the purpose of the purl spec is to "reliably identify and locate software" and the digest is the only way we can "reliably identify" (aka uniquely identify) oci artifacts in registries -- tags cannot help us here. I'm confident that the extra parsing to differentiate tag from digest is not totally unreasonable for the benefit we get in exchange using the digest as the version.
I think |
That is what was proposed but has not happened (yet?). As defined right now, OCI Artifacts are an application of the OCI Image Manifest Specification. It does not generalize to include, for example, the OCI Image Index Specification, which are widely used for distributing multi-platform images. The dependency graph looks (very roughly) like this: If this proposal is specific to OCI Artifacts, that's fine, but that should be more clear. If this proposal applies generically to things that are distributable by distribution-spec, that's also fine, but I have some feedback to make this more clear. |
Just a silly question. Why are we trying to reinvent a generic term, when OCI has already defined OCI Artifacts as the generic means to reference content in a distribution-spec based registry:
|
2ec9f94
to
6aea355
Compare
oci-artifact
as pURL typeoci
as purl type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rnjudge Thank you ++ for the update.. See my few nits inline for your review so we can merge this.
PURL-TYPES.rst
Outdated
- The ``name`` is not case sensitive and must be lowercased. The name is the | ||
last fragment of the repository name. For example if the repository | ||
name is ``library/debian`` then the ``name`` is ``debian``. | ||
- The ``version`` is the ``@sha256:digest`` of the artifact and is strongly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The ``version`` is the ``@sha256:digest`` of the artifact and is strongly | |
- The ``version`` is the ``sha256:hex_encoded_lowercase_value`` of the artifact and is strongly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pombredanne is it possible to make the version
field mandatory for oci package types? I know that version is optional according to the spec but wasn't sure if individual types can override this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of hex_encoded_lower_case_value
could we say hex_encoded_digest
? I think it's important to keep the digest
terminology as it's something that every artifact stored in an oci registry will have and it's clear what this value would be for this package type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rnjudge I reused hex_encoded_lower_case_value
because this is already the term used in the main spec for checksums. I am find with using something else, but digest alone does not tell anything special about a canonical representation (such as lower case) AND it (hex_encoded_lower_case_value
) would have to be updated to there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pombredanne is it possible to make the
version
field mandatory for oci package types? I know that version is optional according to the spec but wasn't sure if individual types can override this.
You can state that, but that does will not prohibit folks not to use this. Also there is a use to have a purl without a version to point to some package, any version (as mentioned by @iamwillbar in another issue)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rnjudge I reused
hex_encoded_lower_case_value
because this is already the term used in the main spec for checksums. I am find with using something else, but digest alone does not tell anything special about a canonical representation (such as lower case) AND it (hex_encoded_lower_case_value
) would have to be updated to there
@pombredanne hex_encoded_lowercase_digest
?
PURL-TYPES.rst
Outdated
last fragment of the repository name. For example if the repository | ||
name is ``library/debian`` then the ``name`` is ``debian``. | ||
- The ``version`` is the ``@sha256:digest`` of the artifact and is strongly | ||
recommended to uninquely identify the artifact. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
recommended to uninquely identify the artifact. | |
recommended to uniquely identify the artifact. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo ;) Thank you
This looks great to me now, thanks @rnjudge! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks great. I've made a couple of minor comments/suggestions.
PURL-TYPES.rst
Outdated
- There is no canonical package repository for OCI artifacts. Therefore | ||
``oci`` purls must be registry agnostic by default. To specify the repository, | ||
provide a ``repository_url`` value. | ||
- OCI purls do not require a ``namespace``. The ``namespace`` is specific |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not require or do not support? I think it's the latter, in which case I'd say something like:
- OCI purls do not require a ``namespace``. The ``namespace`` is specific | |
- OCI purls do not contain a ``namespace``, although the ``repository_url`` may contain a namespace as part of the physical location of the package. |
6aea355
to
c883991
Compare
@pombredanne @iamwillbar nits are resolved on this PR. Please re-review and let me know if there's anything else that needs changing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are almost there.... just a dangling leftover that would need to be removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rnjudge Thank you for pushing this through!
I am approving this as it is. There are some minor refinements that I would like to apply, but that's better done in a separate PR and that would be only on the form and not on substance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @rnjudge for facilitating this discussion as we hope to use purls for all package references.
A few nits on package_url
references, and a tweak on wording around the location. The location is important to fetch the artifact, but the location will change as the artifact is promoted, so the repository_url
provided at creation is most likely not the URL when consumed. It could also be internal disclosure of information that could be used to hack a system, so just suggested a slightly relaxed reference.
The decoupling of location from identity nails the minimum requirements to enable purls to be used generically for package references.
While it's unfortunate the artifactType
was removed however, It doesn't impact the identity. It was purely fur humans to more easily troubleshoot references and could be added at a later date without breaking existing purls.
PURL-TYPES.rst
Outdated
- ``tag``: artifact tag that may have been associated with the digest at the time | ||
- Examples:: | ||
|
||
pkg:oci/debian@sha256:<digest>?repository_url=https://hub.docker.com/_/debian&arch=amd64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the repository_url
is accurate for humans browsing the content on docker hub. Unfortunately, the browsing URL isn't the actual pull URL.
Suggested change:
pkg:oci/debian@sha256:<digest>?repository_url=https://docker.io/library/debian&arch=amd64&tag=latest
docker pull debian:latest
latest: Pulling from library/debian
bb7d5a84853b: Pull complete
Digest: sha256:4d6ab716de467aad58e91b1b720f0badd7478847ec7a18f66027d0f8a329a43c
Status: Downloaded newer image for debian:latest
docker.io/library/debian:latest
➜ docker pull docker.io/library/debian@sha256:4d6ab716de467aad58e91b1b720f0badd7478847ec7a18f66027d0f8a329a43c
docker.io/library/debian@sha256:4d6ab716de467aad58e91b1b720f0badd7478847ec7a18f66027d0f8a329a43c: Pulling from library/debian
Digest: sha256:4d6ab716de467aad58e91b1b720f0badd7478847ec7a18f66027d0f8a329a43c
Status: Image is up to date for debian@sha256:4d6ab716de467aad58e91b1b720f0badd7478847ec7a18f66027d0f8a329a43c
docker.io/library/debian@sha256:4d6ab716de467aad58e91b1b720f0badd7478847ec7a18f66027d0f8a329a43c
PURL-TYPES.rst
Outdated
|
||
pkg:oci/debian@sha256:<digest>?repository_url=https://hub.docker.com/_/debian&arch=amd64 | ||
pkg:oci/debian@sha256:<digest>?tag=bullseye | ||
pkg:oci/metallb@sha256:<digest>?repository_url=registry.io/bitnami |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor nit, as I've seen comments around avoiding vendor references. Perhaps:
pkg:oci/metallb@sha256:<digest>?repository_url=registry.io/namespace/metalb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re:
minor nit, as I've seen comments around avoiding vendor references. Perhaps:
That's not a big deal... examples are best as real examples, so I am fine with it as long as this is not advertizing.
But registry.io is not a real thing: I would rather have real truncated digest and real URLs than made up ones. Also you may want to make this a URL rather than just a host name.
These are nits anyways that can fixed later and not in the way of merging IMHO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pombredanne I can fix this before merging if you don't mind
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Go for it :)
PURL-TYPES.rst
Outdated
- ``arch``: key for a package architecture, when relevant | ||
- ``repository_url``: A repository URL where the artifact may be found, but not | ||
intended as the only location. This value is strongly encouraged so the | ||
content may be fetched |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To balance the discussion for fetching, which won't be possible in private network environments, can we tweak this just slightly:
This value is strongly encouraged so the content may be fetched
The value is encouraged to identify a location the artifact may be fetched.
I believe we're only talking about OCI Images now, not the more generic "artifact". |
842e051
to
7556bdc
Compare
@pombredanne nits are all cleaned up. I think it's finally ready fingers crossed |
7556bdc
to
31b8765
Compare
The current purl specification has a proposed `Docker` entry that can be used when referencing docker container images. However, Docker Hub is not the only registry where container images might be stored. Additionally, a container image is not the only type of artifact that may be stored in an oci registry. This commit adds an `oci` package type and points users to use it instead of `docker` for container images. In order to represent all cloud native artifacts, there needs to be an entry in the purl spec that can represent more than just docker container images. Adding the `oci` entry would specifically enable: - a means to uniquely identify any artifact stored in an OCI-compliant registry, regardless of what registry it may currently be in. (Use cases: importing an artifact from a public registry to a private registry, and promotion within and across private registries) Signed-off-by: Rose Judge <rjudge@vmware.com> Signed-off-by: Nisha Kumar <nishak@vmware.com> Signed-off-by: Steve Laskar <steve.lasker@microsoft.com>
31b8765
to
ad8a673
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks for working through this @rnjudge (and everyone else for the passionate discussion!).
Given the amount of discussion on this one I'd like @pombredanne or @stevespringett to also approve the PR before we go ahead and merge it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thank you for pushing this through.
The current purl specification has a proposed
Docker
entry that can beused when referencing docker container images. However, Docker Hub is
not the only registry where container images might be stored.
Additionally, a container image is not the only type of artifact that
may be stored in an oci registry. This commit adds an
oci
package typeand points users to use it instead of
docker
for container images.In order to represent all cloud native artifacts, there needs to be an
entry in the purl spec that can represent more than just docker
container images. Adding the
oci
entry would specificallyenable:
registry, regardless of what registry it may currently be in. (Use
cases: importing an artifact from a public registry to a private
registry, and promotion within and across private registries)
Signed-off-by: Rose Judge rjudge@vmware.com
Signed-off-by: Nisha Kumar nishak@vmware.com
Signed-off-by: Steve Laskar steve.lasker@microsoft.com