Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal to add oci as purl type #123

Merged
merged 1 commit into from
Oct 22, 2021

Conversation

rnjudge
Copy link
Contributor

@rnjudge rnjudge commented Sep 15, 2021

The current purl specification has a proposed Docker entry that can be
used when referencing docker container images. However, Docker Hub is
not the only registry where container images might be stored.
Additionally, a container image is not the only type of artifact that
may be stored in an oci registry. This commit adds an oci package type
and points users to use it instead of docker for container images.

In order to represent all cloud native artifacts, there needs to be an
entry in the purl spec that can represent more than just docker
container images. Adding the oci entry would specifically
enable:

  • a means to uniquely identify any artifact stored in an OCI-compliant
    registry, regardless of what registry it may currently be in. (Use
    cases: importing an artifact from a public registry to a private
    registry, and promotion within and across private registries)

Signed-off-by: Rose Judge rjudge@vmware.com
Signed-off-by: Nisha Kumar nishak@vmware.com
Signed-off-by: Steve Laskar steve.lasker@microsoft.com

PURL-TYPES.rst Outdated Show resolved Hide resolved
PURL-TYPES.rst Outdated Show resolved Hide resolved
PURL-TYPES.rst Outdated Show resolved Hide resolved
PURL-TYPES.rst Outdated Show resolved Hide resolved
PURL-TYPES.rst Outdated Show resolved Hide resolved
PURL-TYPES.rst Outdated Show resolved Hide resolved
PURL-TYPES.rst Outdated Show resolved Hide resolved
PURL-TYPES.rst Outdated Show resolved Hide resolved
@SteveLasker
Copy link

The key thing I keep seeing come up is an assumption that there is a single public endpoint for certain content.
This has always been a problem for dev environments where someone needs to restore packages from a public service. (npm, nuget, debian, ...) Private package feeds has been a solution.
It's becoming more of a problem when we talk about production dependencies, such as nginx, and all the other public images users depend upon.

We wrote this cross-cloud, vendor-neutral post (Consuming Public Content) last year when Docker was introducing throttles to account for the costs of running a public service.

The reality is content is being widely redistributed. ECR Public, GitHub Registry and others. The same nginx image can be retrieved from multiple public endpionts. If it's pulled into a private registry, which should be "the truth"?

This is no different than buying the same part from different distributors. The part is still the same part, regardless of where it was acquired.

This gets more complicated for private content. A company may not want to disclose where the content was initially created, as that may be some security disclosure of an attack vector.

What I'm hoping is achieved is the identify of the entity that created the artifact is addressed in a signature. Did good-co create this artifact abc123, vs. bad-co that created def456.

What I'm hoping we do is adopt a model of uniquely identifying an artifact, completely independent from where it may have been acquired.

PURL-TYPES.rst Outdated Show resolved Hide resolved
PURL-TYPES.rst Outdated
@@ -221,6 +207,34 @@ nuget

pkg:nuget/EnterpriseLibrary.Common@6.0.1304

oci-artifact

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If container images are in scope for this, then I would consider renaming this purl type to just oci. Not all images conform to OCI Artifacts; e.g. multi-platform images don't fit into that model currently.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above where we agreed this would be oci-artifact as images are one of many types of artifacts.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naming is as hard as scoping :-) I agree with @jonjohnsonjr if the scope was docker vs oci images we should name it oci or oci-image.

I think longer term though we should probably just scope this as all oci-artifact types stored in one of these oci compatible registries. I think that's best for the OCI consumer? borrowing the definitions from the distribution spec:

- **Registry**: a service that handles the required APIs defined in this specification
- **Client**: a tool that communicates with Registries
- **Push**: the act of uploading Blobs and Manifests to a Registry
- **Pull**: the act of downloading Blobs and Manifests from a Registry
- **Blob**: the binary form of content that is stored by a Registry, addressable by a Digest
- **Manifest**: a JSON document which defines an Artifact. Manifests are defined under the OCI Image Spec <sup>[apdx-2](#appendix)</sup>
- **Config**: a blob referenced in the Manifest which contains Artifact metadata. Config is defined under the OCI Image Spec <sup>[apdx-4](#appendix)</sup>
- **Artifact**: one conceptual piece of content stored as Blobs with an accompanying Manifest containing a Config
- **Digest**: a unique identifier created from a cryptographic hash of a Blob's content. Digests are defined under the OCI Image Spec <sup>[apdx-3](#appendix)</sup>
- **Tag**: a custom, human-readable Manifest identifier

in that context(scope) .. oci-artifacts would mean conceptual pieces of content stored as OCI Blobs with an accompanying OCI Manifest containing a OCI Config

@stevespringett
Copy link
Member

The reality is content is being widely redistributed. ECR Public, GitHub Registry and others. The same nginx image can be retrieved from multiple public endpionts. If it's pulled into a private registry, which should be "the truth"?

This has been true for non-cloud native packages for over a decade. "The truth" is resolved when the package is retrieved from somewhere and wherever it was retrieved from forms the basis of many provenance use cases.

This gets more complicated for private content. A company may not want to disclose where the content was initially created, as that may be some security disclosure of an attack vector.

This has nothing to do with oci-artifact and is true for any purl type. If an org uses an internal repo and they do not want to disclose it, they simply omit (or redact) repository_url. Its quite simple.

What I'm hoping we do is adopt a model of uniquely identifying an artifact, completely independent from where it may have been acquired.

Then what you're looking for is no longer PURL, rather Package ID or something similar. Again, completely outside the scope of the oci-artifact ticket we're discussing. If you want to separate identify from location it needs to be done in the core spec, not in the purl types. However, I will caution that separating these two concepts will likely break compatibility for the millions or users that have already adopted PURL.

@iamwillbar
Copy link
Member

I don't think what @stevespringett and @SteveLasker are saying is contradictory, if we start with the premise that a purl contains both an identity and a location, then as a consumer of an oci-artifact purl you can decide whether repository_url is relevant to what you're using the purl for.

With a second assumption that we believe it's useful to be able to refer to oci-artifact with purls, which I believe it is, the question is then how do we unblock this contribution. I'd like to propose one of the previous suggestions as the strawman:

  1. The default repository for oci-artifact is hub.docker.com.
  2. oci-artifact has an optional repository_url qualifier which can be used to specify an alternate repository.
  3. If you explicitly do not want to disclose the repository, you can set repository_url= (i.e. a blank repository_url).

We can also emphasize that you can compare the identity of an oci-artifact ignoring the repository_url due to the identity portion being a content address that uniquely identifies the content.

Does this seem a reasonable way forward?

@SteveLasker
Copy link

Thanks folks. It is complicated.
@stevespringett, I completely agree this has been an existing problem, and thus the point of the Consuming Public content post.
What we're trying to do is decouple where a package happens to be at any one place, from the entity that owns it.
Microsoft distributes all its OCI Artifact type content from MCR, and NVIDIA distributes its content from the NVIDIA registry, so setting the repository_url can be valid.
However, smaller companies or oss projects that don't host their own registry may distribute their content from docker hub, ecr public, github container registry. Why would a company embed one vs. another?

I'm generally in agreement with Williams's suggestion. I don't think we should default to hub.docker.io, as this could imply a squatting attack, where someone could place a matching named artifact in docker hub.

@iamwillbar
Copy link
Member

I'm generally in agreement with Williams's suggestion. I don't think we should default to hub.docker.io, as this could imply a squatting attack, where someone could place a matching named artifact in docker hub.

Because the name is a content hash they couldn't just place any artifact there, they'd have to place the same artifact there. Or did I miss something? If as the author of the purl you know it's not on hub.docker.com then you would set the repository_url to something (or explicitly set it to nothing).

I know this isn't perfect, just trying to come up with a reasonable compromise in the meantime.

@SteveLasker
Copy link

I'm going to use nginx, but it could be anything.
If nginx is pushed to docker hub, ecr public, quay, ..., it would have the same digest (unique id).
The question is which repository_url should it be set to?

As long as it could be nothing, than I think we're fine.
I think the main issue is whether we can use pURL for identification, and optional location

@rnjudge
Copy link
Contributor Author

rnjudge commented Sep 17, 2021

I'm going to use nginx, but it could be anything.
If nginx is pushed to docker hub, ecr public, quay, ..., it would have the same digest (unique id).
The question is which repository_url should it be set to?

If the image was a malicious dupe and content was changed, then the digest would change, right?

I like @iamwillbar's suggestion and think it's a healthy compromise for what we want to achieve right now :) Thanks to you both for working through this.

@nishakm
Copy link

nishakm commented Sep 17, 2021

@iamwillbar Thanks for helping us understand the nuances of the spec!

It looks like we have consensus on using repository_url as a mandatory component, but can be set to nothing if we're using it for identifying the artifact. If this is the case, may I suggest using an explicit setting for "nothing"? Maybe null or none?
I am not sure how parsers can deal with an incomplete repository_url=. It's also more explicit to state that it is set to nothing.

As for the package type, if we are OK being vague about the types of artifacts we are distributing (it looks like we are considering there is a github package type rather than a git package type and a pypi package type rather than a whl package type), would it be reasonable to group all of the package types which are "content addressable and loosely following OCI specifications" under oci?

@dlorenc
Copy link

dlorenc commented Sep 17, 2021

Thanks for getting this started @rnjudge!

I think this will help with some of the awkwardness we found when adopting PURL for OCI images in TektonCD and In-Toto over here: https://github.com/tektoncd/chains/blob/7d1f010eaf55accb83ce712a11bbe83921cdcc30/pkg/chains/formats/intotoite6/intotoite6.go#L250

A couple high-level questions:

  • The type qualifier feels out of place, what's the rationale for inclusion? It seems a bit redundant and error-prone, since things stored in OCI registries are generally self-describing. From a separation of concerns standpoint, a PURL would be used to identify and locate the object, then the object/API contains information about what type of object it is (Image, Index, other). If the type here didn't match the type in the object/API, I think that could cause issues? I don't see a type qualifier on any of the other pURL types either, even the generic one where it might be useful like here. The maven example seems a bit off, since for that the type is actually required by maven to correctly resolve the artifact, where here it's just additional information for a client.
  • The docker type used version to represent the sha256 digest or the tag, which matches how images are canonically referenced in OCI. What's the rationale for splitting this up into two fields here:version (for the digest) and an optional qualifier tag for the tag? This seems like it would make it harder to for clients to correctly round-trip OCI references through PURL and back.

I don't feel too strongly about the overall name (oci-artifact vs oci), but I do think that leaving both oci-artifact and docker will lead to a lot of confusion around which should be used. The terms docker image and container image are unfortunately used interchangeably, while OCI Artifacts are generally understood to be something other than a container image stored in an OCI registry.

Going forward, I think the new oci type should probably be used in almost all cases instead of docker, but agree with @stevespringett that dropping this would be problematic for backwards compatibility. We should at least provide some guidance around which should be used when, is there any kind of precedent for "deprecating" a field while still leaving it in place?

@dlorenc
Copy link

dlorenc commented Sep 17, 2021

would it be reasonable to group all of the package types which are "content addressable and loosely following OCI specifications" under oci?

+1 here, I think this would lead to the least amount of confusion long term. It also matches the other examples well like you described.

Copy link
Member

@pombredanne pombredanne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rnjudge @SteveLasker @nishakm Thank you this PR!

For a start, can you remove the removal of the docker type (or its deprecation) from this PR?

If you feel strongly about removing docker, then a separate PR would be best; but the probability of this removal to be merged is tangentially approaching zero for several reasons highlighted in the discussion here, plus one below.

The same way there is a gopher URL scheme registered at IANA that is still seldom used, we would likely never remove any types from the spec just because a type may be outdated. We could mark some as deprecated though, but this would still be another discussion of its own. More likely, less popular types will fall out of fashion and will stop being used from natural attrition.

As a small nitpicking purl is written lowercase throughout the spec; for consistency, it would be best to use purl rather than the arguably prettier pURL.

About the type name , I feel that oci is a more concise name that could save a lot of typing.

Let me review all the discussions in details over the week-end and provide more feedback!

@rnjudge
Copy link
Contributor Author

rnjudge commented Sep 17, 2021

A couple high-level questions:

* The `type` qualifier feels out of place, what's the rationale for inclusion? It seems a bit redundant and error-prone, since things stored in OCI registries are generally self-describing. From a separation of concerns standpoint, a PURL would be used to identify and locate the object, then the object/API contains information about what type of object it is (`Image`, `Index`, other).  If the type here didn't match the type in the object/API, I think that could cause issues? I don't see a `type` qualifier on any of the other pURL types either, even the `generic` one where it might be useful like here. The maven example seems a bit off, since for that the `type` is actually required by maven to correctly resolve the artifact, where here it's just additional information for a client.

There's a few reasons we want to include this. First off, it's an optional qualifier -- so folks who feel that it's redundant don't have to include it and parsing tools who want to ignore it, can easily do that. Second, it's helpful for the case of using purl in the SPDX spec or any other SBOM using purl that is designed to be more human readable. Specifically, when someone looking at or parsing an SBOM may not use a client tool to verify the artifact type, it could be a helpful hint for them to be able to see the type declared. Of course, it's possible to put a different artifact type that doesn't match the object/API but that's true for any of the fields -- you can put anything you want as long as it's in the right format it's going to technically be a valid purl. It's up to the consumer of the purl to verify its correctness.

* The `docker` type used `version` to represent the sha256 digest or the tag, which matches how images are canonically referenced in OCI. What's the rationale for splitting this up into  two fields here:`version` (for the digest) and an optional qualifier `tag` for the `tag`? This seems like it would make it harder to for clients to correctly round-trip OCI references through PURL and back.

Can you elaborate on your concerns a little bit about how it would make it harder?

I don't feel too strongly about the overall name (oci-artifact vs oci), but I do think that leaving both oci-artifact and docker will lead to a lot of confusion around which should be used. The terms docker image and container image are unfortunately used interchangeably, while OCI Artifacts are generally understood to be something other than a container image stored in an OCI registry.

+1 about the confusion which is why it was proposed to remove the docker entry. However, compatability concerns are understood so we will leave it for now (see below comment)

Going forward, I think the new oci type should probably be used in almost all cases instead of docker, but agree with @stevespringett that dropping this would be problematic for backwards compatibility. We should at least provide some guidance around which should be used when, is there any kind of precedent for "deprecating" a field while still leaving it in place?

I don't know about a "deprecating" field but the plan is to make a note in the docker entry that points/encourages users to use oci artifact until we come up with a more formal way to deprecate docker. I'm waiting for some of the discussion topics to resolve before updating the PR so we don't lose them once I push the changes.

@rnjudge
Copy link
Contributor Author

rnjudge commented Sep 17, 2021

For a start, can you remove the removal of the docker type (or its deprecation) from this PR?

Yes, that is the plan :) I'm just waiting for some of the discussions to clear up before updating the PR so we don't lose the ones happening on the file reviews. Note that I am hoping to add a note under the docker section that urges people to follow the oci-artifact purl examples if this would be approved by the purl team?

If you feel strongly about removing docker, then a separate PR would be best; but the probability of this removal to be merged is tangentially approaching zero for several reasons highlighted in the discussion here, plus one below.

The same way there is a gopher URL scheme registered at IANA that is still seldom used, we would likely never remove any types from the spec just because a type may be outdated. We could mark some as deprecated though, but this would still be another discussion of its own. More likely, less popular types will fall out of fashion and will stop being used from natural attrition.

Deprecating or natural attrition is fine. I think a note in the docker section would speed up this natural attrition. Apologies for the oversight in removing docker from my first PR. We had originally thought docker was only added in 2021 and wouldn't affect compatibility too much.

As a small nitpicking purl is written lowercase throughout the spec; for consistency, it would be best to use purl rather than the arguably prettier pURL.

Thanks for the clarification, Philippe. I will make the necessary changes to lowercase purl.

About the type name , I feel that oci is a more concise name that could save a lot of typing.

This feels like a weak argument when weighed against the justifications for keeping "oci-artifact" :) Most of these purls will be generated automatically so no typing is necessary. Even if it had to be typed out, I can type -artifact in less than 3 seconds ;) Furthermore, oci is a body, similar to cncf or the linux foundation and isn't representative of what we are trying to describe. All things stored in a registry are considered oci-artifacts which is what someone would want to actually represent in their SBOM using purl.

Let me review all the discussions in details over the week-end and provide more feedback!
Hope you don't spend too much of your weekend on this, it can wait for Monday :)

@dlorenc
Copy link

dlorenc commented Sep 17, 2021

This feels like a weak argument when weighed against the justifications for keeping "oci-artifact" :) Most of these purls will be generated automatically so no typing is necessary. Even if it had to be typed out, I can type -artifact in less than 3 seconds ;) Furthermore, oci is a body, similar to cncf or the linux foundation and isn't representative of what we are trying to describe. All things stored in a registry are considered oci-artifacts which is what someone would want to actually represent in their SBOM using purl.

I'll disagree here as well and agree with @pombredanne.

Again, I'm not terribly opinionated on the name but oci-artifact is a very overloaded and confusion-prone term. Not everything stored in an OCI registry is considered an OCI Artifact according to the OCI Artifacts spec. OCI-compliant registries do not even need to support OCI Artifacts, DockerHub and Quay are two examples that don't support them at all yet.

@dlorenc
Copy link

dlorenc commented Sep 17, 2021

Can you elaborate on your concerns a little bit about how it would make it harder?

Here's some pseudocode that I think would need to be written to correctly convert some example container image references to this PURL spec:

ref = "gcr.io/dlorenc-vmtest2/foo:bar"
if ref.has_tag():
  purl.qualifiers.tag = ref.tag()
else:
  purl.version = ref.version()

vs something like:

ref = "gcr.io/dlorenc-vmtest2/foo:bar"
purl.version = ref.tag_or_digest()

@iamwillbar
Copy link
Member

Again, I'm not terribly opinionated on the name but oci-artifact is a very overloaded and confusion-prone term. Not everything stored in an OCI registry is considered an OCI Artifact according to the OCI Artifacts spec. OCI-compliant registries do not even need to support OCI Artifacts, DockerHub and Quay are two examples that don't support them at all yet.

This raises the philosophical question for the PR authors of what is the intent of this purl, is it to refer to an OCI Artifact or to anything stored in an OCI registry? If the former I'd recommend sticking with oci-artifact, if the latter then shortening to oci may make sense.

@rnjudge
Copy link
Contributor Author

rnjudge commented Sep 17, 2021

Can you elaborate on your concerns a little bit about how it would make it harder?

Here's some pseudocode that I think would need to be written to correctly convert some example container image references to this PURL spec:

ref = "gcr.io/dlorenc-vmtest2/foo:bar"
if ref.has_tag():
  purl.qualifiers.tag = ref.tag()
else:
  purl.version = ref.version()

vs something like:

ref = "gcr.io/dlorenc-vmtest2/foo:bar"
purl.version = ref.tag_or_digest()

I want to preface this and say I'm not trying to be snarky here.. but... I'm not sure an extra 3 lines of code can qualify as "making it harder" to create this purl -- it's straightforward to assign a tag and version with the current proposal. Additionally, we're not proposing an if/else for tag and version (digest) -- they can both (and I would argue, should) be included if available. Second, I have an issue with choosing tag or digest for the version. There's value in both of them being there, especially the digest which is the only thing that's actually unique about the artifact you are trying to identify. Using only the tag does not help "reliably identify" the software as purl is intended.

@rnjudge
Copy link
Contributor Author

rnjudge commented Sep 17, 2021

This raises the philosophical question for the PR authors of what is the intent of this purl, is it to refer to an OCI Artifact or to anything stored in an OCI registry? If the former I'd recommend sticking with oci-artifact, if the latter then shortening to oci may make sense.

@iamwillbar It's intended to treat all content in a registry with one purl. If the confusion around oci-artifact is too great, I am ok with changing it to oci even though I also think this is somewhat misleading given than oci is a body of people.

@dlorenc is there another term that could use instead of artifact to go after oci that would help clear things up in your mind?

@nishakm
Copy link

nishakm commented Sep 17, 2021

* The `docker` type used `version` to represent the sha256 digest or the tag, which matches how images are canonically referenced in OCI. What's the rationale for splitting this up into  two fields here:`version` (for the digest) and an optional qualifier `tag` for the `tag`? This seems like it would make it harder to for clients to correctly round-trip OCI references through PURL and back.

The OCI artifact (I am using the term loosely here) digest is the best way to reliably identify this type of artifact. The tag is mutable. However, the digest is not human readable, hence we allow for tag as an optional qualifier. Clients can still pull artifacts based on digest. For example: pkg:oci/foo@sha256<shasum>?repository_url=gcr.io/dlorenc-vmtest2?tag=bar can be translated into gcr.io/dlorenc-vmtest2/foo@sha256<shasum> or gcr.io/dlorenc-vmtest2/foo:bar. The first one gets you the exact artifact while the second one may or may not get you the artifact you want.

@dlorenc
Copy link

dlorenc commented Sep 17, 2021

Additionally, we're not proposing an if/else for tag and version (digest) -- they can both (and I would argue, should) be included if available. Second, I have an issue with choosing tag or digest for the version.

I agree strongly! I'm just worried about doing this in the tooling I work on, where you typically only have one. It's technically possible to include both in the string when pulling an image: with something like gcr.io/dlorenc-vmtest2/foo:bar@sha256... but this is very rarely used. So I agree that setting the field is simple, the challenge is that references usually only contain one of the two so converting to a purl would require extra parsing to first figure out which one has been supplied.

@dlorenc is there another term that could use instead of artifact to go after oci that would help clear things up in your mind?

I can't think of one :( oci-thingy probably doesn't work, but there really isn't a generic word for it in use that I can think of :(

@iamwillbar
Copy link
Member

Would oci-content work? As in, content in a registry. Prefer not to invent a new term not used by the community unless needed.

@dlorenc
Copy link

dlorenc commented Sep 17, 2021

Would oci-content work? As in, content in a registry. Prefer not to invent a new term not used by the community unless needed.

Works for me! Either that oci is fine.

@rnjudge
Copy link
Contributor Author

rnjudge commented Sep 17, 2021

Additionally, we're not proposing an if/else for tag and version (digest) -- they can both (and I would argue, should) be included if available. Second, I have an issue with choosing tag or digest for the version.

I agree strongly! I'm just worried about doing this in the tooling I work on, where you typically only have one. It's technically possible to include both in the string when pulling an image: with something like gcr.io/dlorenc-vmtest2/foo:bar@sha256... but this is very rarely used. So I agree that setting the field is simple, the challenge is that references usually only contain one of the two so converting to a purl would require extra parsing to first figure out which one has been supplied.

Understand the concern here. However, instead of adopting this proposal to use what's convenient for some tooling , we should include the digest/version as a separate entry from the tag instead of the digest/version and tag being either/or. Not to mention, it's best practice when referring to container images to refer to them using their immutable digest and we should strive for this instead of encouraging folks to default to only using the tag. Furthermore, the purpose of the purl spec is to "reliably identify and locate software" and the digest is the only way we can "reliably identify" (aka uniquely identify) oci artifacts in registries -- tags cannot help us here. I'm confident that the extra parsing to differentiate tag from digest is not totally unreasonable for the benefit we get in exchange using the digest as the version.

@dlorenc is there another term that could use instead of artifact to go after oci that would help clear things up in your mind?

I can't think of one :( oci-thingy probably doesn't work, but there really isn't a generic word for it in use that I can think of :(

I think oci-thingy has a nice ring to it myself ;) If oci is the easiest to settle on though, I could be agreeable to that but also still struggling to see the issue with oci-artifact (sorry). From my understanding, the creation of the OCI Aritfacts project was to formally acknowledge the generalization of content stored in a registry. That generalization is referred to as OCI Artifact. Why do images not apply to this?

@jonjohnsonjr
Copy link

From my understanding, the creation of the OCI Aritfacts project was to formally acknowledge the generalization of content stored in a registry. That generalization is referred to as OCI Artifact. Why do images not apply to this?

That is what was proposed but has not happened (yet?). As defined right now, OCI Artifacts are an application of the OCI Image Manifest Specification. It does not generalize to include, for example, the OCI Image Index Specification, which are widely used for distributing multi-platform images.

The dependency graph looks (very roughly) like this:

image

If this proposal is specific to OCI Artifacts, that's fine, but that should be more clear.

If this proposal applies generically to things that are distributable by distribution-spec, that's also fine, but I have some feedback to make this more clear.

@SteveLasker
Copy link

Just a silly question. Why are we trying to reinvent a generic term, when OCI has already defined OCI Artifacts as the generic means to reference content in a distribution-spec based registry:
Open Container Initiative Distribution Specification

While OCI Image is the most prominent, the specification is designed to be agnostic of content types.

Definitions

  • Manifest: a JSON document which defines an Artifact. Manifests are defined under the OCI Image Spec
  • Config: a blob referenced in the Manifest which contains Artifact metadata. Config is defined under the OCI Image Spec
  • Artifact: one conceptual piece of content stored as Blobs with an accompanying Manifest containing a Config

@rnjudge rnjudge changed the title Proposal to add oci-artifact as pURL type Proposal to add oci as purl type Oct 8, 2021
Copy link
Member

@pombredanne pombredanne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rnjudge Thank you ++ for the update.. See my few nits inline for your review so we can merge this.

PURL-TYPES.rst Outdated Show resolved Hide resolved
PURL-TYPES.rst Outdated
- The ``name`` is not case sensitive and must be lowercased. The name is the
last fragment of the repository name. For example if the repository
name is ``library/debian`` then the ``name`` is ``debian``.
- The ``version`` is the ``@sha256:digest`` of the artifact and is strongly
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- The ``version`` is the ``@sha256:digest`` of the artifact and is strongly
- The ``version`` is the ``sha256:hex_encoded_lowercase_value`` of the artifact and is strongly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pombredanne is it possible to make the version field mandatory for oci package types? I know that version is optional according to the spec but wasn't sure if individual types can override this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of hex_encoded_lower_case_value could we say hex_encoded_digest? I think it's important to keep the digest terminology as it's something that every artifact stored in an oci registry will have and it's clear what this value would be for this package type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rnjudge I reused hex_encoded_lower_case_value because this is already the term used in the main spec for checksums. I am find with using something else, but digest alone does not tell anything special about a canonical representation (such as lower case) AND it (hex_encoded_lower_case_value) would have to be updated to there

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pombredanne is it possible to make the version field mandatory for oci package types? I know that version is optional according to the spec but wasn't sure if individual types can override this.

You can state that, but that does will not prohibit folks not to use this. Also there is a use to have a purl without a version to point to some package, any version (as mentioned by @iamwillbar in another issue)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rnjudge I reused hex_encoded_lower_case_value because this is already the term used in the main spec for checksums. I am find with using something else, but digest alone does not tell anything special about a canonical representation (such as lower case) AND it (hex_encoded_lower_case_value) would have to be updated to there

@pombredanne hex_encoded_lowercase_digest?

PURL-TYPES.rst Outdated
last fragment of the repository name. For example if the repository
name is ``library/debian`` then the ``name`` is ``debian``.
- The ``version`` is the ``@sha256:digest`` of the artifact and is strongly
recommended to uninquely identify the artifact.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
recommended to uninquely identify the artifact.
recommended to uniquely identify the artifact.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo ;) Thank you

PURL-TYPES.rst Show resolved Hide resolved
PURL-TYPES.rst Outdated Show resolved Hide resolved
PURL-TYPES.rst Outdated Show resolved Hide resolved
@dlorenc
Copy link

dlorenc commented Oct 10, 2021

This looks great to me now, thanks @rnjudge!

iamwillbar
iamwillbar previously approved these changes Oct 10, 2021
Copy link
Member

@iamwillbar iamwillbar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks great. I've made a couple of minor comments/suggestions.

PURL-TYPES.rst Outdated Show resolved Hide resolved
PURL-TYPES.rst Outdated
- There is no canonical package repository for OCI artifacts. Therefore
``oci`` purls must be registry agnostic by default. To specify the repository,
provide a ``repository_url`` value.
- OCI purls do not require a ``namespace``. The ``namespace`` is specific
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not require or do not support? I think it's the latter, in which case I'd say something like:

Suggested change
- OCI purls do not require a ``namespace``. The ``namespace`` is specific
- OCI purls do not contain a ``namespace``, although the ``repository_url`` may contain a namespace as part of the physical location of the package.

PURL-TYPES.rst Show resolved Hide resolved
@rnjudge
Copy link
Contributor Author

rnjudge commented Oct 15, 2021

@pombredanne @iamwillbar nits are resolved on this PR. Please re-review and let me know if there's anything else that needs changing.

Copy link
Member

@pombredanne pombredanne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are almost there.... just a dangling leftover that would need to be removed

PURL-TYPES.rst Outdated Show resolved Hide resolved
PURL-TYPES.rst Outdated Show resolved Hide resolved
pombredanne
pombredanne previously approved these changes Oct 15, 2021
Copy link
Member

@pombredanne pombredanne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rnjudge Thank you for pushing this through!
I am approving this as it is. There are some minor refinements that I would like to apply, but that's better done in a separate PR and that would be only on the form and not on substance.

Copy link

@SteveLasker SteveLasker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @rnjudge for facilitating this discussion as we hope to use purls for all package references.

A few nits on package_url references, and a tweak on wording around the location. The location is important to fetch the artifact, but the location will change as the artifact is promoted, so the repository_url provided at creation is most likely not the URL when consumed. It could also be internal disclosure of information that could be used to hack a system, so just suggested a slightly relaxed reference.

The decoupling of location from identity nails the minimum requirements to enable purls to be used generically for package references.

While it's unfortunate the artifactType was removed however, It doesn't impact the identity. It was purely fur humans to more easily troubleshoot references and could be added at a later date without breaking existing purls.

PURL-TYPES.rst Outdated
- ``tag``: artifact tag that may have been associated with the digest at the time
- Examples::

pkg:oci/debian@sha256:<digest>?repository_url=https://hub.docker.com/_/debian&arch=amd64

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the repository_url is accurate for humans browsing the content on docker hub. Unfortunately, the browsing URL isn't the actual pull URL.

Suggested change:
pkg:oci/debian@sha256:<digest>?repository_url=https://docker.io/library/debian&arch=amd64&tag=latest

docker pull debian:latest
latest: Pulling from library/debian
bb7d5a84853b: Pull complete
Digest: sha256:4d6ab716de467aad58e91b1b720f0badd7478847ec7a18f66027d0f8a329a43c
Status: Downloaded newer image for debian:latest
docker.io/library/debian:latest

➜ docker pull docker.io/library/debian@sha256:4d6ab716de467aad58e91b1b720f0badd7478847ec7a18f66027d0f8a329a43c
docker.io/library/debian@sha256:4d6ab716de467aad58e91b1b720f0badd7478847ec7a18f66027d0f8a329a43c: Pulling from library/debian
Digest: sha256:4d6ab716de467aad58e91b1b720f0badd7478847ec7a18f66027d0f8a329a43c
Status: Image is up to date for debian@sha256:4d6ab716de467aad58e91b1b720f0badd7478847ec7a18f66027d0f8a329a43c
docker.io/library/debian@sha256:4d6ab716de467aad58e91b1b720f0badd7478847ec7a18f66027d0f8a329a43c

PURL-TYPES.rst Outdated

pkg:oci/debian@sha256:<digest>?repository_url=https://hub.docker.com/_/debian&arch=amd64
pkg:oci/debian@sha256:<digest>?tag=bullseye
pkg:oci/metallb@sha256:<digest>?repository_url=registry.io/bitnami

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nit, as I've seen comments around avoiding vendor references. Perhaps:
pkg:oci/metallb@sha256:<digest>?repository_url=registry.io/namespace/metalb

Copy link
Member

@pombredanne pombredanne Oct 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re:

minor nit, as I've seen comments around avoiding vendor references. Perhaps:

That's not a big deal... examples are best as real examples, so I am fine with it as long as this is not advertizing.
But registry.io is not a real thing: I would rather have real truncated digest and real URLs than made up ones. Also you may want to make this a URL rather than just a host name.

These are nits anyways that can fixed later and not in the way of merging IMHO

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pombredanne I can fix this before merging if you don't mind

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go for it :)

PURL-TYPES.rst Outdated
- ``arch``: key for a package architecture, when relevant
- ``repository_url``: A repository URL where the artifact may be found, but not
intended as the only location. This value is strongly encouraged so the
content may be fetched

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To balance the discussion for fetching, which won't be possible in private network environments, can we tweak this just slightly:

This value is strongly encouraged so the content may be fetched

The value is encouraged to identify a location the artifact may be fetched.

@sudo-bmitch
Copy link

The location is important to fetch the artifact, but the location will change as the artifact is promoted

I believe we're only talking about OCI Images now, not the more generic "artifact".

@rnjudge
Copy link
Contributor Author

rnjudge commented Oct 21, 2021

@pombredanne nits are all cleaned up. I think it's finally ready fingers crossed

The current purl specification has a proposed `Docker` entry that can be
used when referencing docker container images. However, Docker Hub is
not the only registry where container images might be stored.
Additionally, a container image is not the only type of artifact that
may be stored in an oci registry. This commit adds an `oci` package type
and points users to use it instead of `docker` for container images.

In order to represent all cloud native artifacts, there needs to be an
entry in the purl spec that can represent more than just docker
container images. Adding the `oci` entry would specifically
enable:

- a means to uniquely identify any artifact stored in an OCI-compliant
registry, regardless of what registry it may currently be in. (Use
cases: importing an artifact from a public registry to a private
registry, and promotion within and across private registries)

Signed-off-by: Rose Judge <rjudge@vmware.com>
Signed-off-by: Nisha Kumar <nishak@vmware.com>
Signed-off-by: Steve Laskar <steve.lasker@microsoft.com>
Copy link
Member

@iamwillbar iamwillbar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for working through this @rnjudge (and everyone else for the passionate discussion!).

Given the amount of discussion on this one I'd like @pombredanne or @stevespringett to also approve the PR before we go ahead and merge it.

Copy link
Member

@pombredanne pombredanne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you for pushing this through.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.